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^ Abstract 

'~~7 We study a class of two-player repeated games with incomplete information 

Ph 

r\ , and informational externalities. In these games, two states are chosen at the 

r^ outset, and players get private information on the pair, before engaging in 

C^ repeated play. The payoff of each player only depends on his 'own' state and 

Ch on his own action. We study to what extent, and how, information can be 

exchanged in equilibrium. We prove that provided the private information of 

►^ each player is valuable for the other player, the set of sequential equilibrium 

C ■ payoffs converges to the set of feasible and individually rational payoffs as 

■^ players become patient. 

^, 

^ Whether and how to acquire information is a question faced by most decision 

O makers. In statistical decision problems, the decision maker tries to learn the value 



of an unknown parameter, and he can sample from an exogenous population at a 
fixed cost per draw. In other contexts, information is held by strategic agents. In 
signalling games for instance, a player holding payoff-relevant private information 
tries to influence the action choice of an uninformed party. There, the rationale 
for disclosing/hiding information is that the uninformed party's action affects the 
payoff of the informed player. We here study a class of repeated games with private 
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information, in which there is no such direct strategic interaction: as in setups of 
social learning, or of strategic experimentation, payoffs do not depend upon other 
players' actions. However, players hold private information that has value to other 
players. Our goal is to understand to what extent information can be exchanged at 
equilibrium along the play, assuming communication is costly. This assumption of 
pure informational externalities plays a dual role. Obviously, it simplifies the analysis 
of the model and allows to study the exchange of information in isolation from other 
strategic considerations. But it leads to a game in which we might least expect 
exchange of information, and any positive result in this setup might potentially open 
the way for the analysis of other setups. 

As an illustration, consider the following game. There are two biased coins, Ci 
and 6*2 (say, with parameter |), which are tossed independently once at the outset 
of the game. Each of two players, i = 1,2, has to repeatedly guess the outcome of 
coin Ci. A correct guess yields a payoff of one, while an incorrect one yields zero, and 
successive payoffs are discounted. If past payoffs are not observed, there is no role 
for direct inference, and it is natural to expect that player i will repeatedly 'guess' 
the most likely outcome, for an expected payoff of |. Assume that, once coins are 
drawn, each player i gets to observe the outcome of coin Cj, where j 7^ i, but that 
cheap talk is excluded. This private information has no 'direct' value, but it might 
have a strategic value, because it is valuable to the other player. In this game, is 
there an equilibrium payoff that improves upon (|, 1) ? While stylized, this game is 
similar to the situation faced by executives of two different firms, who hold private 
information on their own firm. Trading the stock of one's own firm is illegal, at times 
where private information is most valuable, and the executives may be tempted to 
implement some implicit collusive scheme of information exchange through time. 

We start with few simple observations. Since cheap talk is assumed away, ex- 
change of information is to take place through actions, by 'encoding' privately held 
information into actions, and by conditioning the action choice of player i on the 
outcome of coin Cj. Plainly, there is no equilibrium in which each player i fully 'dis- 
closes' the outcome of Cj at stage 1. Indeed, once informed of the outcome of C2, 
player 2 can make the correct guess in all subsequent periods, and has no incentive 
whatsoever to incur the cost of disclosing information to player 1, including in stage 
1. But then, player 1 would not be willing to play the myopically suboptimal action 
in stage 1. More generally, private information is here an asset to be exchanged for 



private information, at a cost. On the one hand, a player cannot disclose informa- 
tion without assigning positive probability to his myopically suboptimal action, and 
thereby incurring a costj^ On the other hand, a player is not willing to play a myopi- 
cally suboptimal action, unless he expects to be rewarded with valuable information 
in return: no player wants to be the last one to disclose information. Not surpris- 
ingly, when the horizon is finite we prove that (|, |) is the unique equilibrium payoff. 
Having an infinite horizon raises the possibility of a gradual, open-ended exchange 
of information. Since however, the total 'amount' of information to be exchanged is 
bounded, the feasibility of such a process is not ensured. 

Our model is a generalization of the stylized example. Two 'states', s and t, are 
drawn independently, and players get private information on both s and tjj Next, 
the players repeatedly choose actions, which are publicly disclosed. The crucial as- 
sumption we make is that a player's payoff only depends on 'his' state, and on his 
own action. 

We prove that, provided that the information held by each player is valuable to 
the other player, the limit set (as 5 — )■ 1) of sequential equilibrium payoffs coincides 
with the set of all feasible payoffs, that are at least equal to the initial, myopic optimal 
payoffs. In the simple example discussed above, this limit set is thus equal to the 
set [|, 1] X [|,1]. Not only information can be shared, but the rate of information 
exchange can be arbitrarily high relative to the discount rate. Our equilibria share the 
following features. Players start by reporting truthfully whatever information they 
received on their own state. This leads to a continuation game in which no player 
holds private information on his own state. As a result, each player is able to compute 
how costly it is for the other player to play his suboptimal action, and is therefore able 
to adjust accordingly the amount of information he discloses as a 'reward'. Players 
next exchange information in an open-ended manner. The analysis presents two 
main and mostly independent difficulties. One is to design open-ended equilibrium 
processes, according to which information is exchanged. In our construction, the bulk 
of information exchange takes place early in the game. Later information disclosure 
only serves as a means to compensate for previously incurred costs. The second 
consists in adjusting this continuation play so as to provide the incentives for truthful 
reporting of one's information on one's own state. 

^Unless if indifferent, but this possibility will play no role. 

^That is, each player receives information both on his state and on the other player's state. 



Our motivation stems from repeated games with incomplete information. The 
hterature on such games started with Aumann and Maschler (1966, 1995), and was 
extensively developed under the assumption of no discounting, see chapters 5 and 6 
in Aumann and Hart (1992). When there is no discounting, communication through 
actions becomes costless, and our model, trivial. Besides the literature on reputation 
models, see Mailath and Samuelson (2006) for a survey, there is only limited work 
dealing with discounted repeated games with incomplete information. Recent con- 
tributions are Cripps and Thomas (2003), Peski (2008) and Wiseman (2005) jj Both 
Cripps and Thomas (2003), and Peski (2008) look at games with one-sided informa- 
tion, in which each of the two players knows his own payoff function, and one of the 
two is unsure of the payoff function of the other player. Cripps and Thomas (2003) 
prove that a Folk Theorem type of result holds in the limit where the prior belief 
converges to the case of complete information. Peski (2008) essentially shows that 
all equilibria are payoff-equivalent to equilibria that involve finitely many rounds of 
information revelation. Wiseman (2005) looks at situations of common uncertainty. 
Players share the same information on the underlying state of nature, and refine this 
information by observing actual choices and payoffs. 

Starting with Crawford and Sobel (1982), the huge literature on strategic infor- 
mation transmission and on cheap-talk games addresses issues related to ours. The 
paper that is closest to our work is Aumann and Hart (2003). There, prior to playing 
a game once, two players, one of which is informed of the true game to be played, 
exchange messages during countably many periods. Aumann and Hart (2003) char- 
acterize the set of equilibrium payoffs. Following an example of Forges (1990), they 
show that allowing for an unbounded communication length may increase the set of 
equilibrium payoffs. There are however significant differences with our setup. On 
the one hand, this literature allows the game to exhibit informational and strategic 
interaction as well. On the other hand, information is one-sided, and communication 
is costless. 

Finally, the pattern of information disclosure in our setup is reminiscent of the 
pattern of contributions in dynamic models of public good contributions, see Admati 
and Perry (1991), Marx and Matthews (2000), or in the dynamic resolution of the 
hold-up problem, see Che and Sakovics (2004). More generally, and as Compte and 
Jehiel (2004) argue, the existence of a history-dependent outside option forces equi- 
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libriuin concessions to be gradual in many bargaining situations, just as information 
disclosure has to be gradual and open-ended here. There are however differences 
between the results on the two models. First, in dynamic models of public good con- 
tributions the evolution of contributions follows a deterministic trend, while in games 
with incomplete information beliefs follow a martingale. A more significant differ- 
ence is the following. In the former models, there is a one-to-one relation between 
the contributed amount and the cost incurred when contributing: the more a player 
contributes, the higher his cost by doing so. Here, the cost of disclosing information 
is independent of the amount of information that is disclosed. The reason is that the 
cost of disclosure is incurred when playing a (myopically) suboptimal action, while 
the amount of information is a function of how revealing such an action is. The more 
precise the belief, the higher the cost of disclosing information. 

The paper is organized as follows. Section [l] contains the model and a statement 
of our main results. Section [2] presents the main ideas of the proof through a version 
of the example discussed above (allowing for private signals). Section |3] is devoted to 
concluding comments. Proofs are provided in the Appendix. 

1 Model and Main Result 

1.1 The Game 

We study a class of two-player repeated games with incomplete information. At the 
outset of the game, a state of the world is realized, and the two players receive private 
signals, \ E L and m G M respectively. At each stage n > 1, players choose actions 
from the action sets A and B. Actions, and only actions, are then publicly disclosed. 

All sets are finite. Players share a common discount factor 6 < 1. 

We make the following assumption: 

A.l The set of states of the world is a product set, S x T, with elements denoted 
by (s,t). Player i's payoff depends only on his own action and on the i-th 
component of the state. That is, the payoff function of player 1 is a function 
M : S* X A — 7- R, while player 2's payoff is given by a function f : T x i? — > R. 

A. 2 Signal sets are also product sets, L = Ls x Lt and M = Ms x Mt, with ele- 
ments denoted by / = {Is, It) and {nis, ttit)- The random triples (s, l^, 1115) and 



(t, \t, my) are drawn independently of each other according to the distributions 
p e A{S X Ls X Ms) and q G A(r x L^ x Mr) respectively^ 

Assumption A.l ensures that the game is one of pure informational externalities. 
Player 1 cares about player 2's behavior only to the extent that player 2's behavior 
conveys information about s. 

The independence assumption A. 2 is often made in games with two-sided incom- 
plete information, see, e.g., Zamir (1992). Not only does it imply that the two states 
s and t are independent, but also that the two private signals, say of player 1, l^ 
and It, are independent. The first component Ig of player I's private signal should 
be thought of as the information received by player 1 on his own state s, while the 
second component It is player I's information on player 2's state, tjj Besides allowing 
for tractability, assumption A. 2 implies that behaving myopically is an equilibrium. 
That is, assume that player 1 repeatedly plays an action a^ that maximizes the ex- 
pectation of m(s, a), given I5. Then, by A. 2, the belief held by player 2 over his own 
state does not change along the play, and it is a best reply for player 2 to repeatedly 
play an action 6^ that maximizes the expectation of v{t, b), given iiit. And vice- versa. 
If instead the two states s and t were correlated, then the choice of a^ might be infor- 
mative about t, and may lead to a change in player 2's action, which would in turn 
be informative about s. But then, to manipulate this informational feedback, player 
1 might be tempted to 'mis-represent' his myopically action a^,. Such an example is 



provided in Section [372] Assumption A. 2 assumes away these effects. 

In a sense, assumptions A.l and A. 2 imply that the only motive for disclosing 
information on the other player's state is to get back information in exchange: no 
information will ever be exchanged, unless out of purely strategic reasons. We stress 
that, although the triples (s, 15,1115) and (t. It, hit) are independent, we make no 
assumption on the distributions p and q. Thus, the two signals relative to a given 
state may be correlated in an arbitrary way among themselves, or with the state 
itself. 

The main question that we ask is whether and to what extent valuable information 
can be exchanged at equilibrium, and how to organize the exchange of information 



''^Here, the state of the world is (s, t), the signal to player 1 is (Is, It), and the signal to player 2 
is (ms,mT). We will refer to s (resp., to t) as to player I's state (resp., player 2's state). 

^The subscripts in Ig, It serve a mnemonic purpose. We use boldface letters to denote random 
variables, when we fear confusion might be at stake. 
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along the play, while meeting equilibrium requirements. Our main result consists of a 
characterization of the limit set of sequential equilibrium payoffs, as players become 
patient. 

Strategies will be denoted by a and r for players 1 and 2 respectively. A be- 
havior strategy of player i maps his private information and the public history of 
past moves, into a mixed action. Accordingly, behavior strategies are maps a : 
L X H ^ A{A) and r : M X i7 -> A(5), where H = U„>o(A x B)"- is the 
set of finite sequences of moves. A strategy pair (o", r) induces a probability dis- 
tribution over the set of (infinite) plays. Expectations under this distribution are 
denoted by Ep^g^o-,r- Thus, the expected discounted payoff of player 1 is given by 
7Kp, Q, ^, r) = Ep,g,,,, [(1 - 5) Y.n=i 5""'w(s, a„)]. We denote by p„ G A(5 x Ms) the 
belief held by player 1 at stage n: it is the conditional distribution of the pair (s, 1115), 
given 1, and the (public) sequence of previous moves. Note that p^^ depends only on 
1, because there is no history of moves prior to stage 1. However, the computation of 
p„ involves player 2's strategy for n > 1. We denote by q„ G A(T x Lt) the belief of 
player 2 at stage n on his state of nature and on the signal that player 1 received on 
this state of nature. 

1.2 Preliminaries 

Loosely put, our main result is the following. Provided that each player holds infor- 
mation that is valuable to the other player, and that players are patient, any feasible 
and individually rational payoff is a sequential equilibrium payoff. Before we state 
formally this result, we list some preliminary remarks, and give few definitions. 

1.2.1 Myopically optimal payoffs 

Given a probability distribution tt G A(S') and an action a G A, we write u{'k, a) for 
the expected payoff of player 1 when holding the belief tt and playing action a: 

M(7r, a) = Ejr['u(-, a)] = \^ 'k{s)u{s, a). 

We will often abuse notation and write u{n, a) whenever vr is a distribution over a 
product space of the form S x Q, for some finite set Q. In that case, ^(Tr, a) = 



\j7r({s} X Q)u{s,a). Given a distribution vr G A(S'), the myopically optimal payoff 

ses 

when holding the behef vr is 

u^{7t) := max u{tt , a) . (1) 

For a given a & A, the map vr i— )■ M(7r, a) is affine. As a supremum of finitely many 
affine functions, the map m* is convex and piecewise linear. An action a G A is optimal 
at Tc if it achieves the maximum in ([I|. 

Recall that p^ G A(S' x Ms) is the (interim,) belief of player 1 prior to stage 1. 
Thus, the probability P]^(s,m5) assigned to {s,ms) is equal to p{s,ms\\s)- A myopic 
strategy of player 1 is a strategy am that repeats the same, optimal action at p^^. The 
{ex ante) payoff induced by am does not depend on player 2's strategy, and is equal 
to 

u^ := Ep K(pi)] = ^ p{ls)u^{p{-\ls))- 

If player 2's strategy does not depend on m^, the belief of player 1 does not change 
along the play: p„ = p^ for each n > 1. In such a case, the expected payoff of player 
1 does not exceed m^. Thus, u^ is the minmax value for player 1 in the repeated game. 
For similar reasons, 

v^ := Ep K(qi)] = V] g(mr)t'*(g(-|mr)) 






is the minmax value for player 2. 



Player I's payoff is highest when he knows all he may possibly know about s, given 
the rules of the gamej^that is, when player 2's signal ms is made public. Player I's 
belief is then denoted by p G A(S'); thus, p(s) = p(s|l5,m5), for each state s & S. 
Conditional on both signals I5 and xtisi player I's optimal payoff is Ujr(p)- Therefore, 
player I's ex ante expected payoff does not exceed 

M^^ := Ep[M^(p)] = ^ p{ls,ms)u^{p{-\ls,ms)). 

Since m* is convex, one has m^^ > m^. This reflects the fact that the marginal value 
of the information held by player 2 is nonnegative. 



^This can be formally deduced from Blackwell Theorem [5]. 



Conversely, any payoff in [u-^, u^,-^) is a feasible payoff for player 1, provided players 
are patient. Therefore, the (limit) set of feasible and individually rational payoffs for 
player 1 is the interval [■u^,'U^^]. 

Similarly, we define q = 5'(-|1t, hit) and f^^ := Eg[f^(q)]. As for player 1, the limit 
set of feasible and individually rational payoffs for player 2 is the interval [f^,f^^]. 

The example discussed in the introduction will serve as a leading example. Here, 
all four sets S, T, A and B are equal to {0,1}. Payoffs are given by u{s,a) = 1 if 
s = a, and u{s, a) = if s 7^ a (resp., v(t, b) = 1 if t = b, and v(t, 6) = if t 7^ 6). We 
will refer to this setup as the Binary Example, and we will use it repeatedly, with 
various information structures. Note that the myopically optimal action is a = 1 if 
and only if the belief assigned by player 1 to s = 1 is at least 1/2. 

Binary Example 1 Assume here that Ls is a singleton, while player 2 observes s 
(that is, Ms = S, andp{s,ms) = if s ^ it^s)- The belief of player 1 can be identified 
with the probability tt assigned to state 1, and 'U^(7r) = maxJTr, 1 — vr}. On the other 
hand, since player 2 observes s, p is either or 1, and -u^^ = 1. 

1.2.2 Valuable information 

A player will not be willing to play a myopically suboptimal action unless he expects 
to receive information in return, the marginal value of which offsets the cost incurred 
when playing the suboptimal action. In particular, a necessary condition for improv- 
ing upon (m^, f +) is that each player holds information that is valuable to the other 
player. 

We stress that it is not enough that each player holds information on the other 
player's state. Indeed, if, e.g., the two signals 1^ and m^ coincide p-a.s., player 1 
already knows all player 2 knows relative to s. Nor is it enough that players have 
private information, as the next example shows. 

Binary Example 2 Assume that p{s = 1) = | and that player 2 receives a binary 
signal nis which (conditional on s) is correct with probability |. Assume moreover 
that Ls is a singleton, so that player 1 has no information and p^ = p = |, and 
■u^ = |. By Bayes' rule, the posterior probability p assigned to state 1 is equal to 
p = I if ms = 1, and is equal to | if ms = 0. In either case, the optimal action at 
p is a = 1, and, therefore, knowing the information of player 2 will not change the 
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optimal behavior of player 1: the information held by player 2 is valueless. Note that 
Ui,i, = Fj[u{p, 1)] = u{p, 1) = u^, where the middle equality holds by the law of iterated 
expectations. 

In the previous example, the information held by player 2 is valueless because it 
does not affect player I's optimal action. Indeed, the precision of player 2's signal 
is lower than the prior evidence that the state is 1, hence player 2's signal cannot 
provide decisive evidence against state 1. 

This observation motivates the definition below. 

Definition 1 The information of player 2 is valuable for player 1, if 

E [n^ (p) ll^] > M^(pi), with p — probability 1. (2) 

Similarly, the information of player 1 is valuable for player 2 if E[f^(q)|mT] > 
fj,(qi), with g-probability 1. 

Condition ^ is an interim requirement: it is equivalent to requiring that, after 
learning I5, player 1 assigns positive probability to the event that his optimal action 
would change, if player 2's signal nig were made public. 

This condition implies that u^^ > m^, but it is not implied by this inequality. 
Indeed, the latter condition is an ex ante requirement, that amounts to requiring 
that E [u-t (p) II5] > M*(pi) holds with positive p-probability. 

Definition [1] does not provide a measure of information value, but only a criterium 
for deciding whether the information held by a player has positive value or not. 
Note that this criterium involves player I's utility function u, and is therefore game- 
dependent. 

1.3 Main result 

Our main result is the following. 

Main Theorem 1 Assume that the information of each player is valuable to the 
other player. Then, as 6 ^ 1, the set of sequential equilibrium payoffs converges to 
the set [m^,m^J x [v^,v^^]. 

The proof of Theorem [l] is provided in the Appendix. Formally we prove that 
given any compact set W C (m^,m^^) x (t>^,t>^^), there is (5o < 1 such that each vector 
in PF is a sequential equilibrium payoff of the (5-discounted game, as soon as 6 > 60. 

10 



Note that player I's expected payoff does not exceed (1 — S)u^, + 5ui,i,, because 
player I's action in stage 1 is based only upon 1. It will follow from the proof of 
Theorem [I] that equilibrium payoffs as high as m^^, — (1 — 5)c can be implemented 
in the (5-discounted game, for some constant c. These two estimates give a rate of 
convergence of the set of equilibrium payoffs in the (5-discounted game. 

The conclusion of the theorem is still valid when players hold different discount 
factors, which converge to 1. 

A few comments are in order. The theorem gives a sharp answer to the question 
we posed. There exist equilibria, in which almost all information can be exchanged, 
with a negligible delay, in spite of the fact that information exchange must be gradual 
and open-ended. This stands in sharp contrast to conclusions obtained in dynamic 
models of public good contribution, see Compte and Jehiel (2004). The driving force 
that explains this contrast is the following. Here, the cost of disclosing information is 
the opportunity cost of playing a suboptimal action, while the amount of information 
depends on the evolution of beliefs and, therefore, on the extent to which actions are 
correlated with private information. As a result, cost and amount are disentangled. 
By contrast, in a public good model, there is a one-to-one relation between the cost 
of a given monetary contribution, and the amount contributed. 

When the information, say, of player 2, has a positive ex ante value {u^,^, > m^), 
but is not valuable according to Definition [T| the conclusion of the theorem does not 
hold. Indeed, whenever the signal I5 received by player 1 fails to satisfy inequality 
(|2|, player 1 will infer (at the interim stage) that the information held by player 2 
has no value. Then, player 1 will not be willing to disclose any information to player 
2, although the information he holds may be valuable to player 2. In Section |3} we 
extend the characterization of the theorem to cover such cases. 

The extension to an arbitrary number of players is outside of the scope of this 
paper. With more than two players, there exist cases where some player i holds no 
private information, yet receives information in equilibrium. The basic intuition is 
that player i may be 'rewarded' by some other player j, for information that has been 
disclosed to j by a third player k. 
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2 The Binary Example 

We here explain the main ideas of the equihbrium construction within the binary 
setup. For simphcity, we assume here that as in the example in the Introduction, 
each player knows exactly the other player's statej^ 

All equilibria share the feature that each player starts by reporting truthfully his 
private signal relative on his own state, Ig and my respectively. The rationale for 
this report is the following. In the continuation game, no player then holds private 
information on his own state. Consequently, player i will always know the posterior 
belief of player j on j's state. This in turn allows player i to compute the perceived 
cost incurred by player j when playing either action, and to adjust accordingly the 
amount of information he discloses. Of course, adequate incentives will have to be 
provided to ensure truthful reporting. 

We first deal with games in which players get no information on their own state, 
which we call self-ignorant games. We next explain how to provide incentives for 
a truthful report in the general case. The discussion of how to combine these two 
logically independent steps is relegated to the Appendix. 

2.1 The self- ignorant case 

In this section, we analyze the game discussed in the introduction. The states s and 
t are drawn according to p G A(5') and q G A(T). Player 1 is told t, and player 
2 is told s. Hence, p (and q) may be viewed as a distribution over 5* = {0, 1}. We 
identify any distribution over S (resp., over T) with the probability assigned to state 
s = 1 (resp., to state t = 1). For concreteness, we assume that p and q are such that 
p,q> l- 

2.1.1 A first equilibrium profile 

We here describe one specific equilibrium profile that will serve as a building block 
for the general construction. In this profile, the play is divided into two phases. 
In the first phase, at odd stages (resp. at even stages) player 1 (resp. player 2) 
randomizes, thereby transmitting information to player 2 (resp. to player 1), while 
player 2 (resp. player 1) plays his myopically optimal action. The second phase starts 



^That is, It ^ t with g-probability 1, and mg = s with p-probabiUty 1. 
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when the player who is randomizing plays his myopically optimal action, and lasts ad 
infinitum. Along this phase both players play their myopically optimal action. Thus, 
the play path looks as follows: in the first few stages the players alternately play their 
myopically suboptimal action, and then the play switches to myopic play. 

Randomizations are informative: in stage 1 for instance, the mixed action used 
by player 1 depends on t, so that the belief of player 2 in stage 2 depends on player 
I's action in stage 1. 

The play in the first phase has a cyclic pattern, with period 4. The evolution of 
beliefs along the play is illustrated in Figure 1. This figure involves two parameters, 
p* and q*, which will later be pinned down by equilibrium requirements. 



P, 1 -9 




p,q 



Stage 1 



Stage 3 



Stage 4 



Stage 5 



Figure 1: The play as long as both players play suboptimally. 

How to read this figure? Consider first stage 1. In stage 1, player 2 plays with 
probability 1 his optimal action (which is 6 = 1 since q > ■^). Consequently, the belief 
of player 1 in stage 2 is p, the same as in stage 1. Meanwhile, player 1 plays his 
suboptimal action a = with probability Xt- The values of xo and xi are defined to 



be xi :- 



X 



Q 



X 



q — q 



and Xq :- 



X 



Q 



X 



q — q 



where q* will be 



q q + q* — 1 ^ — Q q + q* — 1 

defined below. By Bayesian updating, the posterior belief q2 of player 2 in stage 2 is 

equal to 1 — g < ^ following a = 0, and is equal to q* > q following a = 1. In player 2's 

eyes, the suboptimal action a = is played with probability x := qx + {l — q)x = ^. 

Note that x and q* solve q = x{l — q) + (1 — x)q*, which reflects the martingale 

property of beliefs. Because q > 1/2 it follows that q* > q and therefore Xi < Xq. 
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Consider next stage 2. If player 1 played his optimal action a = 1 in stage 1, 

players stop exchanging information, and repeat forever their optimal actions, a = 1 

and 6=1. If player 1 played his suboptimal action a = 0, player 2 reciprocates 

and assigns a probability ys to his suboptimal action (which is now b = 1 because 

1 — p p* — p 

X 

p p + p* — 1 



1 — q < |). The values of yo and yi, are set to yi := y = — - — x ^ and 



p p — p 

yo := y = x (where p* is defined below) so that the belief of player 

- 1 — p p + p* — 1 

1 in stage 3 is either p* or 1 — p. The overall probability assigned to the action 6=1 
is y := py + {1 — p)y, which solves p = y{l — p) + {1 — y)p*. 

Consider now stage 3 (assuming a = was played in stage 1). If the optimal 
action 6 = was played in stage 2, players stop exchanging information and repeat 
their optimal actions, a = 1 and 6 = 0. If player 2 played his suboptimal action 
in stage 2, player 1 reciprocates and plays as in stage 1, only with the roles of the 
states/actions exchanged. To be precise, player 1 assigns positive probability to his 
suboptimal action (which is now a = 1 because 1 — p < ^). As in stage 1, this 
probability is set to x, if the true state t happens to be the state which player 2 
currently considers more likely (which is now state t = because qg = 1 — g < |), 
and it is set to x otherwise. By Bayesian updating, the belief of player 2 in stage 4 is 
either g or 1 — q*. And so on. 

Observable deviationqj by player i are ignored by player j. 

The value of p* is dictated by equilibrium requirements. Observe first that, at 
equilibrium and at any node where player 1 is supposed to randomize, his expected 
continuation payoff must be equal to 'U^(p). Indeed, consider any such node, say at 
stage 2n + 1. If he plays his optimal action, player 1 gets an expected payoff of 
M^(p2„_|_i) in stage 2n + 1, and in all future stages as well, since the players then stop 
to exchange information. Note that Mj,(p2„_|_i) = m*(p), because P2n+i is either equal 
to p or 1 — p, depending wether n is even or odd. Because he is randomizing, player 
1 should be indifferent between both actions, and the claim follows. 

Let us now consider stage 1. If player 1 plays his suboptimal action in stage 1, his 
continuation payoff from stage 3 on, is equal to M*(p) if player 2 plays 6 = 1 in stage 
2, and to u^(p*) if player 2 plays 6 = 0. Since the probabilities of these two events 



^Such deviations consist in playing the currently suboptimal action at a node at which the optimal 
action was expected. 
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are equal to y and 1 — y respectively, the overall payoff of player 1 is then equal to 

(1 -m- u,{p)) + 5{l - 5)u,{p) + 5^ (yu^ip) + (1 - y)u,{p'')) , 

where the first two terms are the contributions of the first two stages to the overall 
payoff. When equating this last expression with u^{p), and using p > |, one obtains 

1-S 



p*=p+{2p- l] 



5^ + 5-1 

1-6 



The same argument, when applied to player 2, yields q* = q + (2g — ly ^.^ 

0^ + — 1 

The parameter values p*, q*, x, x, y, y all lie in (0, f ) as soon as es < p,q < 1 — es, 
where €s = -p^f^zij that is, as soon as initial beliefs are not too precise. Note that €s 
goes to as 5 goes to f. 

Conversely, and whenever es < p,q < 1 — eg, this profile is a Nash equilibrium of 
the repeated game. Observe that, while player I's payoff is u^,{p), the payoff of player 
2 is equal to 

/(g) := {1 - 6)v4q) + 6 (xv^q) + {1 - x)v.{q*)) 
= v^{q) + ^— {2v^{q) - 1) > v^{q). 

2.1.2 Further equilibrium payoffs 

We here build upon the previous section, and introduce a class of simple equilibrium 
profiles, that implement all equilibrium payoffs (in the limit 5 — )■ 1). 

In these equilibria, most of the information on player 2's state that player 1 will 
ever transmit is transmitted at stage 1 by randomizing in that stage, and similarly, 
most of the information on player I's state that player 2 will ever transmit is transmit- 
ted in stage 2. From stage 3 on, the players either implement the equilibrium profile 
defined in the previous section, or switch to a myopic behavior. The equilibrium that 
is implemented from stage 3 on depends on the cost of information transmission in 
stages 1 and 2. 

We will now be more precise. Let q,q,jP,p^,p^ and p^ be arbitrary beliefs in 
(0, 1), such that 0<g<|<g<g<l, and < p" < ^ < p < p" < I for 
a G {0, 1}. A further condition will be imposed later. We let the discount factor 6 
be sufficiently large such that all six beliefs, which are determined below, lie in the 
interval {es,l — ss). 
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Player 1 randomizes in stage 1, and plays his optimal action in stage 2. Player 
2 plays his optimal action in stage 1, and randomizes in stage 2. We choose these 
state-dependent randomizations in such a way that the beliefs of the players evolve 
as indicated in Figure 2. Continuation payoffs from stage 3 also appear on this figure. 
Observe that the optimal action of player 2 in stage 2 depends on the action played 
by player 1 in stage 1, since q <\ < q. 



p^,q: information exchange; continuation payoffs Ui,{ji'^)^ f{q) 

p^,q: myopic play; continuation payoffs U-i,{p'^),Vi,(q) 

p^,q: information exchange; continuation payoffs Ui,{p^)^f{q) 

p^^q: myopic play; continuation payoffs u^,{p^),v^,{q) 



p,q 




Figure 2: The beliefs in the first three stages. 

Since all beliefs lie in (e^, 1 — £5), this strategy profile is well-defined. As an 
example, consider the top final node in Figure 2. At that node, players' beliefs are p^ 
and g, and players switch to the equilibrium profile which we designed in the previous 
section, taking p^ and q as initial beliefs. 

We claim that equilibrium conditions for player 2 are satisfied. Indeed, note that 
the function /(■) solves 

vM = (1 - ^)(l - <Q)) + ^/(^), for each q E [0, 1]. 

Hence, whatever be the action played by player 1 in stage 1, player 2 is indifferent 
between his two actions at stage 2, as desired. 

The overall payoff to player 2 is equal to (1 — 5)f^(g) + 5 {xVi,{q) + (1 — a;)t>+(g)) . 
As 5 goes to 1, the weight of the first stage decreases to 0. Since q and q were 
arbitrary, this payoff spans the whole interval (g, 1) = {v^,,Vi,^,). 

It remains to ensure that player 1 is indifferent between both actions in stage 1. 
That is, the difference in payoffs in stage 1 should be offset by a difference in the 
expected continuation payoffs. This is done by adjusting the amount of information 
disclosed by player 2 in stage 2 to the action played by player 1 in stage 1: more 
information is disclosed if player 1 plays his suboptimal action in stage 1. 



16 



Formally, the indifference condition translates to: 

(l-5)K(p)-(l - uM)) = S\y\,{p') + {l-y')u.{f))-6\y'u,{p') + {l-y')Mp')). 

(3) 
A necessary condition is that p^, jP be chosen such that the overall payoff when playing 
the suboptimal action in stage 1 is at least u^,{p). Conversely, one can check that for 
any such choice of p°,p°, there exist p^,p^ G (p°,p°) such that equation (J3| holds. 
The overall payoff to player 1 is then equal to 

(1 - <5)(i - uM) + 5(1 - s)Mp) + 5' (yV(/) + (1 - y>.{f)) ■ 

A 6 goes to 1, the weight of the first two stages decreases to zero. Since p^,jp are 
arbitrary (subject to the individual rationality condition), this payoff spans the whole 
interval (p, 1) = {u^,v^^). 



2.2 The self- informed case 

While sticking to the binary setup, we now allow for private signals on one's own state, 
and we informally describe the main ideas of the construction. All details appear in 
the Appendix. All equilibria share a common structure. Equilibrium play is divided 
into four successive phases. 

In phase 1, players 1 and 2 report their signals I5 and niT, by means of encoding 
them into finite strings of actions in a one-to-one way. 

The crucial phase is Phase 2. It is designed so as to provide incentives for truthfully 
reporting in phase 1. In expectation, very little information is exchanged in that 
phase. It is organized as follows. Each player i draws a random message jj and 'sends' 
it to player j (again, by encoding messages into sequences of actions). This message 
is slightly correlated with player j's state, but is independent of the report of player 



j in phase 1^° Next, player j plays a long and deterministic sequence of actions. 
We refer to these two subphases as phases 2.1 and 2.2 respectively. The length of 
phase 2 is of the order of ln(l — 6)/ In 5 for some positive 6, hence phase 2 contributes 
a fraction of 9 to the total discounted payoff p] The prescribed sequence of actions 

^We distinguish between signals, drawn by nature, and messages, chosen by the players and 
possibly subject to strategic considerations. 

^"Even if drawn independently of player j's report, it is crucial that it is sent only after player j's 
report. 

^^The value of 9 is independent of 6, but may have to be adjusted to the equilibrium payoff. 
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depends both on player j's report in phase 1, and on player i's message in phase 2. 
The intuition behind this structure is discussed later. 

In the continuation game that starts after phase 2, players hold no private in- 
formation about their own state and they implement an equilibrium of the resulting 
self-ignorant game. The bulk of valuable information exchange takes place in phase 

3. Each player i draws a random message that is correlated with player j's state, and 
'sends' it to player j. The degree of correlation is adjusted as a function of player j's 
equilibrium payoff. The (conditional) law of player i's message does not depend on 
player j's report. 

Phase 4 consists of the remaining stages. Prior to the first stage, A^, of phase 

4, each player i assesses the belief held by player j, assuming reports in phase 1 
were truthful. He also assesses the total discounted cost incurred by player j in all 
previous stages - with the exception of the stages of phase 2.2. Players switch to 
the equilibrium profile, as designed in the previous section, which is associated with 
a payoff equal to the myopic payoff, (^^(p^), f^(q^)), plus a bonus that exactly 
offsets the cost incurred earlier. Because few stages are taken into account for the 
computation of the past cost, this bonus is at most of the order of (1 — 5) c, for some 
positive c. Because it is small, hardly any valuable information is exchanged during 
phase 4. The role of the bonus is to ensure that each player, when randomizing in 
either phase 2 or phase 3, is indifferent between all messages. 

Observable deviations of player i trigger a myopic play of player j . 

We now provide a few insights into phase 2, by means of two examples. Phase 2 
is designed so that the expected payoff of a player in phase 2.2 is strictly higher when 
reporting truthfully than when not. 

Assume first that Ls = {[s,h}j and that player 1 assigns to state s = 1 a 
posterior probability | when observing [g, and | when observing Is- Note that the 
optimal action of player 1 is a = in the former case, and a = 1 in the latter case. 
Here, incentives can be provided by simply requiring that player 1 repeats the action 
which is optimal given the signal he reported. Indeed, assume for concreteness that 
player 1 receives \s = Is- If player 1 reports truthfully [g and plays as required the 
action a = 0, his expected payoff is | in all stages of phase 2.2. If instead player 
1 untruthfully claims that he received Is and next plays the action a = 1, which is 
optimal given his report but in fact suboptimal, his expected payoff is only \. Such 



an incentive scheme is appropriate whenever the niyopically optimal action of player 
1 is in one-to-one relation with Is- 

In some cases, more complex schemes are needed. Assume that Ls = {Isj^s^^s}, 
and that the posterior probability assigned to state s = 1 is respectively | following 
Is, \ following l*g and | following Is- Here, conditional on s, we let player 2 send a 
message /i in {*, 0, 1}. The probability of /x = * is close to 1, and does not depend on 
s, so that this message conveys no information. Conditional on yU 7^ *, the message 
/i coincides with s with probability 1 — e, where e > is small. Next, player 1 is to 
repeat a sequence a of actions of length 8, which depends on both report and disclosed 
state: 

• If yU = *, the sequence a is independent of player I's report; 

• Following a report of /^, the sequence a contains four O's (and four I's), irre- 
spective of the message sent by player 2; 

• Following a report of /5., a contains six O's (and two I's) if the player 2's message 
is /i = 0, and seven O's if /i = 1; 

• Following a report of Is-, a contains six I's if /x = 1, and seven I's fi = 0. 

It is straightforward that strict incentive requirements are met for £ > small 
enough. 

We conclude by explaining the equilibrium logic. Assuming that player i chooses 
to truthfully report in phase 1, the computation of the continuation profile in phase 
4 ensures that player i is indifferent between all messages in phases 2.1 and 3, and 
has no profitable deviation in phase 4. However, the prescribed sequence of actions in 



phase 2.2 may involve suboptimal actions, and player i may consider deviating from 



this sequence. Such a deviation fails to be profitable as soon as the marginal value of 
the information received in phase 3 compensates the cost incurred along phase 2.2. 
This condition puts a (mild) constraint on the relation between the duration of phase 
2 and the amount of information disclosed in phase 3. 

What if player 1, say, chooses instead to mis-report his signal in phase 1, and 
claims that he received the signal I'g when in fact receiving Z^? If Is and I's are 



^^The cost of which is not taken into account in the bonus. 
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equivalent, in the sense that the behef of player 1 is the same following either signal: 
p{'\h) = p{'\^'s) ^ ^('5'), such a deviation is irrelevant,^^ Hence, assume that player 1 
reports a signal I'g that is not equivalent to Is, and plays consistently with his report 
thereafter]^ Three properties combine to ensure that such a deviation fails to be 
profitable. First, observe that the joint distribution of the two messages received 
from player 2 (in phases 2 and 3) does not depend on player I's report. Hence, 
misreporting does not affect the information received from player 2 in phases 2 and 3. 
Next, observe that the continuation strategy of player 2 in phase 4 entails almost no 
information disclosure. As we show in the Appendix, this implies that the best-reply 
continuation payoff of player 1 in phase 4, is at most M^(p^) + (1 — 6)C, for some 
positive C, where p^ is the 'true' belief of player 1. That is, in spite of the fact that 



player 2's continuation strategy is based on a wrong assessment of player I's belief^ 
the maximal gain for player 1 is very small. Finally, by design, the expected payoff 
received by player 1 in phase 2 is strictly higher when reporting Is than I'g. 

Because the weight of phase 2 in the total payoff is 6, the small gain obtained in 
phase 4 can not offset the loss incurred in phase 2.2 when reporting I'g rather than Is- 



3 Extensions and concluding comments 

3.1 Ex ante vs. interim valuable information 

We here discuss how to adapt the statement of the Main Theorem to a situation in 
which information is ex ante valuable, but need not be interim valuable. That is, we 
assume that both inequalities u^ < m^^, and v^ < f^^, hold, but that the information 
held by player i need not be valuable in the sense of Definition [1} 

We will assume that with ^-probability 1, player 1 has a unique myopically optimal 
action at Pj^, and that the symmetric propertyF^ holds for player 2 Define Ls C Ls 



^^We note that the definition of equivalent signals adopted here is quite specific to the binary case, 
and will be different in the general case. 

^^The case where player 1 fails to play consistently, and deviates from the prescribed sequence 
will pose no specific problem. It is left to the appendix. 

""^^Because it is assuming that player 1 reported truthfully. 

-•^^If a player has two myopically optimal actions at p^, he can costlessly reveal information to the 
other player. 



20 



to be the set of signals Is for which the inequahty ^ does not hold, and define 
Mt C Mt in a symmetric way. We argue that the limit set of sequential equilibrium 
payoffs is equal to the set 

E := [u^,u,q{MT) + u,,q{MT \ Mt)] x [v,,v,p{Ls) + v,.p{Ls \ Is)]. 

Observe first that at any equilibrium, if rriT G Mf, player 2 realizes (interim) that 
the information held by player 1 has no value, and player 2 will then repeat his 
unique, optimal action at q^^. Thus, conditional on hit, and using the independence 
assumption A. 2, player I's expected payoff is at most u^ if my G Mt, and at most u^.^, 
if ywt ^ Mt- Using the same argument for player 2, this shows that any equilibrium 
payoff vector lies in the set E. 

Conversely, consider the following class of strategy profiles. In the first two stages, 
each player i 'tells' player j whether the information held by j has positive value to 
i or not. This is done as follows. In stage 1, player i plays his myopically optimal 
action. In stage 2, player 1, say, repeats this action if I5 G L5, and switches to a 
different (suboptimal) action if I5 ^ Ls to signal his willingness to disclose/acquire 
information. If both players switched in stage 2, they implement from stage 3 on 
an equilibrium such as we designed in the proof of the Main Theorem. Otherwise, 
players repeat their stage 1 action. The sole role of stage 1 is to instruct the other 
player how to interpret the action played in stage 2. 

If Is G -L5, it is strictly dominant for player 1 to repeat his optimal action through- 
out, as required. Indeed, playing a different action in stage 2 would only lower player 
I's payoff, with no benefit since player 2's information is valueless. 

If Is G Ls\Ls, player I's overall payoff is ti*(pi) if he pretends that the information 
held by player 2 is valueless. However, because there is a positive g-probability that 
rriT ^ Mt, it is a best reply for player 1 to switch to a suboptimal action in stage 2 
as soon as the value of the information disclosed by player 2 exceeds on average the 
cost incurred in stage 2J^ 

In such an equilibrium, conditional on I5, player I's payoff is u^,{pi), which is 
then also equal to E[m^(p)|15], if I5 G Ls- If instead I5 ^ Ls, then with probability 
q{MT \ Mt), player I's payoff may be as high as E[m^(p)|15]. Otherwise, player I's 
payoff will be (approximately) 'U^(p^). The ex ante expected payoff can therefore be 

^ It cannot be optimal for player 1 to pretend that I5 ^ Ls, yet to lie about his optimal action. 



21 



as high as 



u^qiMr) + q{MT \ Mt)u, 



3.2 The correlated case 

The case where the independence assumption A. 2 does not hold raises significant 
challenges. For the sake of simplicity, we focus here on the binary setup. 

A first difficulty is that myopic play need not be an equilibrium. Hence it is not 
clear what is the lowest equilibrium payoff. 

Binary Example 3 Here, states and signals are functions of an auxiliary variable 
u, which can assume four different possible values uji, i G {1, 2, 3, 4}. The probabilities 
of the different values, and the way u determines the states and signals is as follows: 
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When behaving in a myopically optimal way, players play as follows. In stage 1, 
player 2 plays 6 = 0. Indeed, the probability he assigns to state t = is either 1 or ^. 
Meanwhile, player 1 plays a = 1 if I = l^, and a = if I = P. Indeed, the probability 
assigned to state s = is ^ in the former case, and | in the latter case. In stage 2, 
player 1 repeats his stage 1 action. On the other hand, player 2 can deduce uj from 
player I's stage 1 choice. If player 1 played a= 1, player 2 repeats his stage 1 action. 
If instead player 1 played a = 0, player 2 will play either b = or b = 1, depending 
on player 2's information. In the former case, players repeat forever their stage 2 
action. In the latter, player 1 deduces u from player 2 stage 2's action, and obtains 
a payoff' of 1 in all later stages. This obviously creates an incentive for player 1 to 
deviate in stage 1, and to always play a = 0, in order to learn the value of u in stage 
3. Thus, it is not an equilibrium to behave myopically. In this example, information 
exchange is a consequence of equilibrium behavior. 
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A second difficulty lies in understanding the extent to which information can be 



exchanged. In Section 2.1.1 we described specific periodic equilibria. In fact, there 
are many degrees of freedom in this construction, and the choice of using periodic 
equilibria was made to facilitate the computation of the equilibrium requirements. It 
turns out that when information is correlated there are no degrees of freedom in deter- 
mining the beliefs, and the sequence of beliefs is not periodic: it satisfies uninspiring 
recursive equations, that do not seem to have closed-form solutions. Preliminary nu- 
merical evidence seems to suggest that some equilibria can be constructed along these 
lines. 

3.3 The finite horizon case 

We here comment on the assertion that the exchange of information relies on the 
horizon being unbounded. We argue that, when the horizon is finite, the myopic 
equilibrium is the unique Nash equilibrium of the binary example, as soon as p^^ 7^ ^ 
p-a.s., and Qi 7^ | g-a.s. This claim is an illustration that in many cases of interest, 
a bounded horizon prevents players from exchanging information. 

Let T be the length of the game, and let a Nash equilibrium be given. We prove 
that, for any sequence /i„ of length n > of moves that occur with positive probability, 
the continuation payoff of player 1 is u^(pn) if ?n 7^ ^; and the continuation payoff of 
player 2 is f*(q'„) if Pn 7^ ^. We argue by backward induction over n. The claim holds 
trivially for n = T. Assume that the claim holds for every history of length n + 1, 
and consider a history of length n such that qn 7^ ^- We proceed in two steps. 

Step 1: We claim that E['u^(p„^;^)|/;,„] = u-^{pn). 

Note first that the claim trivially holds if player 2 does not randomize following 
hn, since one then has p„+i = Pn with probability 1. 

Assume now that player 2 randomizes at stage n following hn- Observe that 
Pn+i niust then be equal to ^ with positive probabihty. Otherwise, by the induction 
hypothesis the continuation payoff of player 2 would be Vi,{qn+i), irrespective of the 
action played by player 2 in stage n. But then player 2 would not be indifferent in 
stage n between the two actions - a contradiction. 

It follows that both possible values of p„+i lie in the same half of the interval 
[0, 1]. The claim follows, because E[p^_,_]^|/;,rt] = Pn and because u^ is linear in both 
[0,i]and[i,l]. 
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Step 2: Conclusion 

Since g„ 7^ ^, the event (in+i ¥" 2 ^^^ ^ positive probability conditional on hn- Let 
a G A be any action that is played with positive probability at hn, and following which 
Qn+i 7^ i- When playing a and by the induction hypothesis, the continuation payoff of 
player 1 in stage n + 1 is equal to Uj,(p„_^]^). As a consequence, and when playing a, the 
continuation payoff of player 1 in stage n is equal to {1 — 6)u{pn, a) + E[Ui,{p.^_^_i)\hn] < 
Ui,{pn), using Step 1. By the equilibrium property, it follows that the continuation 
payoff of player 1 in stage n (following /i„) is equal to M^(p„). 

We note, though, that there may be cases in which information exchange is pos- 
sible, even when the game is finitely repeated. The following example illustrates this 
point. 

Consider the binary example, and add to each action set one more action, that 
we denote by 2, and that yield payoff | irrespective of the state. The optimal action 
of player 1, as a function of the belief assigned to state s = 1, is given by Figure 3 
below, and the structure of player 2's best-response is similar. 



1 2 

3 3 



Figure 3: The optimal action of player 1 

Example 1 Assume that the two states are equally likely, and that player 1 learns t 
while player 2 learns s. Suppose that in stage 1 player 1 plays [|(ao), |(ai)] if 't= 0, 
and [|(ao)5 li^^i)] if 't= ^, and suppose that player 2 plays in an analog way. Player 
Vs belief in stage 2 is either [|(0), |(1)] or [|(0), |(1)], depending on player 2's action 
in stage 1. In the former case, we let player 1 play either a^ or a2 depending on t. In 



the latter case, we let player 1 play either ai or 02, depending on t.^^ Let the behavior 
of player 2 in stage 2 be analog. Provided 5 is high enough, this strategy pair is an 
equilibrium, in which players exchange all information in two stages. 



The type of construction presented in Example 3^ is valid only for intermediate 
values of initial beliefs, and for self-ignorant games. Moreover, it is not clear how to 
use such profiles as continuation profiles in general games. 



^®We let player 1 repeat 02 forever if player 2 played 62. 
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3.4 Signals along the Play 

How do our results change when players receive independent signals, both on their 
own state and on the other player's state, along the game? If player i receives signals 
on his own state along the play, then, because he is patient, he will wait until almost 
all the information that he can get about is own state is received. However, if in 
subsequent stages he keeps on receiving information on his own state, then his belief 
changes, even though the changes are small. These changes have the effect that the 
other player does not know how to compensate player i for playing suboptimally, and 
our construction fails. 

If, on the other hand, player i received signals on the other player's state along the 
game, then again the players can wait until player i receives almost all the information 
that he can, and then the players can implement the equilibrium that we construct. 
Thus, our results remain valid in this case. 
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A Self-Ignorant Games 

We assume throughout the Appendix that all payoffs are in [0, 1]. In this section we 
assume that no player receives private information on his own state. Equivalently, 
both sets Ls and Mt are singletons. For simplicity, we here write L and M instead 
of Lrp and Ms- 

Let initial distributions p G A(S' x M) and g G A(T x L) be given. We denote the 



corresponding game by r(p, q). W.l.o.g., we assume that p{m) > for each m E M ^^ 
We assume that the information of each player i is valuable to player j. Since 
r(p, q) is a self-ignorant game, this is equivalent to assuming that there is no action 
a E A that is optimal at all distributions Prn '■= p{'\^-, "^ ^ M. Equivalently, one 
has 

u^ = u^{p) <u^^= ^ p{m)u^ {pm) . (4) 

By Bayes' rule, the belief p„ of player 1 at stage n is the weighted average of 
{pm ® Im, "^ £ M}, where the weight of Pm ® Im is equal to the probability that the 



signal of player 2 is m, given player I's information at stage n-^° Thus, the set of 
possible values of p„ is 

A^S X M) = conv{p„ (g) 1„, m G M}. (5) 

Because p{m) > 0, p lies in the (relative) interior of A"''(S' x M), which we denote by 

of 

A (S* X M). We define qi = q{-\l) for / G L, and the set At(T x L) is defined in a 
symmetric way. 

It is convenient to allow the initial distribution to vary, to account for the fact that 
beliefs may change along the play. Since all beliefs lie in A^^(S' x M) and A^(T x L), 
we will only consider initial distributions in these sets. We still denote arbitrary such 
distributions by p and q. 

In this section, we prove the two propositions below. 



^^And that q{l) > for each I £ L: to avoid useless repetitions, we sometimes state properties for 
player 1, with the implicit understanding that analog properties hold for player 2 as well. 

^'^This is a way of stating that, as the play proceeds, the belief of player 1 on m evolves, but the 
distribution of s conditional on m remains fixed. 
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ot ot 

Proposition 1 Let p G A {S x M) and q & A (T x L) be given. There exists e > 
and 6 < 1 such that the following holds. For every 6 > 6, every payoff vector in 
[m^(p),m^(j9) + £:] X [v^{q) , v^,{q) + e] is a sequential equilibrium payoff of T{p,q). 

Given such p and q, a payoff vector 7 G [u^{p) , u^{p) + £:] x [f^(g), f^(g) + e], and 
a discount factor 6, we will construct an equilibrium profile ((Tp,g,7, Tp,g,^) in T{p,q), 
with payoff 7. Proposition [2] bounds the possible gain of player 1 if player 2 has an 
incorrect belief on p, provided 7 is close to the myopically optimal payoff. 

ot ot 

Proposition 2 Let p G A (SxM), g G A (TxL) and c > be given. There exists a 
constant C > with the following property. For every discount factor 6, every payoff 
vector 7 such that II7 — ('u^(p),t>^(g))||oo < (1 — S)c, and every p' G A"I'(S' x M), one 
has 

-f^{p', q, a, Tp^g^^) < u^{p') + (1 - S)C, for every strategy a. 



In Propositions hi and ^ e, C and 6 may depend a priori on the choice of (p, q). We 
will prove that they can be chosen in such a way that the conclusions hold uniformly 
throughout some neighborhoods of p and q. 

A.l Notations and Preliminaries 

We here describe the main steps leading to the proof of Propositions [T] and |2} Our 
goal is to mimic the recursive construction of the binary case. We let [p°,p^] be any 
segment in the interior of A\S x M) such that u^ is not affine on the segment [p°,p^]. 
The beliefs p^ and p^ take the role of p and 1 — p in the binary case. 

An optimal action a at p^ is not optimal at p^ (and vice- versa). Otherwise, a 
would be optimal throughout the segment [p^jp^], and then -u^ would coincide with 
the affine map u{-,a) on that segment. 

For k = 0,1, we let a^ G A be an optimal action at p^. We denote by D^ the 
straight line spanned by p° and p^ in R'^^^, and we denote by p and p the endpoints 
of the segment D^ fl A"''(5' x M), with the convention of Figure 4. 
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A(5) 




Figure 4 

Let TT G [pojP]) and assume that player 1 receives information that changes his 
behef from p" to either p^ (with probabihty y) or vr (with probabihty 1 — y). So that 



the martingale property of beliefs holds we must have p^ 



yp 



[1 — i/)7r. Assume 



moreover that from the next stage on player 1 receives his myopically optimal payoff. 
The gain of player 1 from the information that is revealed to him relative to his 
myopically optimal payoff at p°, is then hpo{n) = {yUi^ij)^) + (1 — y)u^{TT)) — Ui,{jP). 
Since -u^ is convex, hpoiji) > for each n. Since m^ is not affine on the interval [p°,P"'^], 
one also has hpoiji) > for vr G (jf',p\, see Figure 5. In addition, hpo is piecewise 
afiine, and non-decreasing as tt moves away from p" towards p. 



yu^ip^) + (1 -y)u^{py) 




\h\y) 



Figure 5 

Similarly, define hpi : [p^,p\ -^ R+ by V(^) = (l/w*(P°) + (1 " l/)w*(7r)) -M*(p^), 
where y solves p^ = yp^ + {1 — y)n . 
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We proceed in a symmetric way with player 2. We let [g°, q^] be an arbitrary 
segment in the interior of A''"(T x L) such that the restriction of f^ to the segment 
[q^, q^] is not an afiine map. We denote by D^ the straight line in R-'"^-^ spanned by 
g° and q^, and by q, q the endpoints of the segment D^ fl A"''(T x L). Finally, we define 
hgo : [g", q] — )■ R"*" and hgi : [g\ q] — )■ R"*" by adapting the definitions of hpo and hpi. 

Given a belief tt E A{S x M), and an action a E A, the cosi of a at vr is defined 
as the loss incurred when playing a instead of the optimal action at vr: 

c(7r, a) := u^{tt) — u^n, a). 

The cost c{iT, 6), for TT G A(T X L) and b E B is defined analogously. 

The proof of Propositions [l] and [2] relies on Lemmas [l] and |2] below. 

Lemma 1 Let 6 < 1 be such that — - — c{p\a^) < max/i„i and — - — c{q^,V) < 



la&xhqi, for i,j = 0, 1. Then the vector ('U^(p°),t>^(g°) + ^-^c{q^,b^)) is a sequential 
equilibrium payoff ofr{p^, g°) . 

Lemma 2 Let s > be such that e < max hpi, and e < max hqj for i,j = 0,1. There 
is 6 < 1, such that for every discount factor 6 > 6, every payoff in [M*(p°),t>^(g°) + 
e] X [v-k{q^) , v^,{q^) + e] is a sequential equilibrium payoff of V{jp ,q^). 

We will prove that the conclusion holds uniformly for all initial distributions p\ 
q^ close to p* and qK To be precise, there exist a neighborhood V{p'^) of p*, and 
a neighborhood V{q^) of q^ {i,j G {0,1}) such that, for every 6 > 6, and every 
p^ G V{p^),q^ G V{q^), all vectors in [M^(p*),-u^(p*) + e] x [v^{q^),Vi,{q^) + e] are 
sequential equilibrium payoffs of T{p\q^). 

A. 2 Proof of Lemma [l] 

In the construction of Section |2| the probabilities x and y assigned to suboptimal 
actions were pinned down by equilibrium requirements. The construction here is 
slightly more involved because the number of signals may be larger than 2. 

We let 5 be as stated. Define first p^ G [p^,p) by the condition hpo{p^) = 

— r — c(p°, a^), and y^ G [0, 1) by the equality p° = y^p^ + (1 —y^)p^. The revelation of 



information just defined offsets the cost to player 1 of playing the suboptimal action 

30 



a^ when the behef if p^. p^ takes the role of p* in the binary case. For m G M, we 
1 _ p\m)^^^ 



set y^ 



p^{m) 



y^. Because p^ is in the (relative) interior of A'^(S x M), one has 



p^{m) > for each m, and yj^ G (0, 1). Observe that y^ = '^meM P^ (^ym^ ^^^ ^^^^ 
the following Bayesian updating property holds. If player I's belief is p^, and if player 
2 plays two different actions b and b' with respective probabilities t/^ and 1 — t/^, 
then following b the posterior belief of player 1 is equal to p^, and it is equal to p^ 
following b'. 

Similarly, we let p^ G {p^,p) be defined by /ipi(p°) = ^^c{p^,a^), and we set 
_ /M„.o 



V: 



p^(m 



-y^ for m G M, where y^ solves p^ = y^p^ + (1 — y^)p^- 



We next exchange the roles of the two players, and proceed in a slightly asymmetric 
way. We let q^ G {q^,q) be defined by hgo{q^) = — - — c(g°,6^), we let x" be defined 



by g° = x°g^ + (1 ~ x'^)q^, and we set X; 



6 



p'iiy 



We finally define (f G (g^, g), x^ G (0, 1), and x\ = —rrrrX^ for / G L in a similar 



way. 



q'iiy 



We are now in a position to define strategies cr* and r^. As long as players alternate 
in playing their suboptimal action, player 1 (resp., player 2) randomizes in each odd 
(resp., in each even) stage, and beliefs evolve cyclically: 

p°, g° -)■ p°, q^ -)■ p\ q^ -)■ p\ g° -)■ p°, g° -)■ ■ ■ ■ 

Along this cycle, player 1 assigns a probability x| to his suboptimal action, a^, when 
player 2's belief is g°, and a probability a;° to his suboptimal action, a°, when player 
2's belief is q^. Analog properties hold for player 2. This is summarized in Figure 6 
below. 



Stage 


player 1 


player 2 


belief 


Suboptimal action 


1 mod 4 


[xl{a'),{l-xl){a')] 


60 


/,g° 


a' 


2 mod 4 


aO 


[l/i.(&°),(l-l/i.)(&^)] 


/,g^ 


W 


3 mod 4 


K(aO),(l-x?)(ai)] 


b' 


p\q^ 


a" 


mod 4 


a^ 


[y'n.ib'),{i-y'jm] 


p\q' 


b^ 



Figure 6: the first phase of play: information exchange. 
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As soon as either player 1 plays his optimal action in some odd stage, or player 2 
plays his optimal action in some even stage, the players switch to myopic play forever, 
as described in columns 3 and 4 in Figure 7. Here and later, o{p) (resp. o(g)) stands 
for an optimal action of player 1 at p (resp., of player 2 at g). 



First stage in which 








myopically optimal action is played 


new belief 


player 1 


player 2 


1 mod 4 


P',q' 


a" 


om 


2 mod 4 


p\q^ 


o{f) 


b' 


3 mod 4 


P\<f 


a' 


o{cf) 


mod 4 


P',q' 


o{f) 


6° 



Figure 7: the second phase of play: myopic play. 

We complete the definition of (cr^, r^) by specifying actions and beliefs at infor- 
mation sets that are ruled out by {(ji^^T^,). For concreteness, we focus on player 1. 
An information set of player 1 contains all histories of the form (/,/i), for a fixed 
signal / G L, and a fixed sequence h E H oi moves. Fix an information set that is 
reached with probability under {a^,T^). We denote it by J/^^, with h E H. Write 
h = {h', (a, 6)), so that h' is the longest prefix of h, and assume that //^, is reached 
with positive probability. 

We distinguish two cases. Assume first that the action b has probability zero 
conditional on h'. This is the case where player 2 deviates in an observable way at h'. 
We let the belief of player 1 at //^ be equal to the belief held at //^, - the deviation 
by player 2 is interpreted as being non-informative about m. Assume now that b is 
played with positive probability at h'. In that case, the belief of player 1 at //^^ can 
be computed by Bayes' rule, from the belief held at J/^^,. 

In both cases, we let the belief at all subsequent information sets be equal to the 
belief at 1/;^, and we let a^ repeat forever any action that is optimal at J/^j. 

Observe that, following any history in //^, under r^ player 2 repeats forever the 
Indeed, either the sequence h of actions has probability 0, or it has 



same action 



21 



positive probability. In the former case, the claim follows from the definition of r^ at 
zero probability information sets. In the latter case, this implies that the information 

^^To be precise, player 2 plays the same action at /^ f^ and in all subsequent information sets. 
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set II, ^ has positive probability, for some /' 7^ /. Since the support of player I's mixed 
actions in the information phase does not depend on his signal, this implies that 1}^^ 
must belong to the myopic play phase. Using this observation, one can check that 
beliefs are consistent with the strategy profile (o"^,r^). We omit the proof. 

Note that the strategy ct^ is sequentially rational at any l}^ that is reached with 
probability 0. Indeed, since the belief of player 1 is the same at //^ and at all 
subsequent information sets, it is a best reply to repeat any action that is optimal at 

Lemma 3 The profile (o'^,rj,) is a sequential equilibrium ofr{p^,q^), with payoff' 
K(/),t;.(g°) + V^c(g°,6i)). 

We will use this lemma for various distributions p^,q^. To avoid confusion, we 
will then denote the profile {a^ ''^ ,t^ ''^ ). 

Proof. Each of the strategies a^ and r^ can be described by an automaton with 8 
states: four states that implement the periodic play in Figure 6, and four states that 
implement the myopic play in Figure 7. 

In addition, transitions between (automaton) states are deterministic and depend 
only on the public history of moves. Hence, player i can always compute the current 
state of player j's automaton. Moreover, as can be verified inductively, the belief of 
player i following any public history h of moves only depends on the current state of 
player j's automaton. 

It follows that player i has a best response that can be implemented by an automa- 
ton that has the same (or smaller) number of states as the automaton of player j. The 
dynamic programming principle may be used to identify such a best response. Using 
this principle, it is routine to verify that t^ is a best response against o"^, and vice versa. 
Indeed, denoting the 8 states of the automata hj Q = {(1, periodic), (2, periodic), 
(3, periodic), (0, periodic), (1, myopic), (2, myopic), (3, myopic), (0, myopic)}, the 
expected payoff to player 2 starting at any given a; G ^2 is: 



\/(l, periodic) = v^q^) + ^ciq^, b'); 


\/(l, myopic) = w^(g° 


\/ (2, periodic) = v^{q^); 


\/(2, myopic) = v^{q^ 


\/(3, periodic) = v^iq*) + ^ciq\ 6°); 


\/(3, myopic) = v^{q^ 


\/(0,periodic)=t;,(gO); 


ViO,mYopic)=v,i(f 
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One may verify that for every u ^ Q, V solves 

V{uj) = max \ (1 - S)r{uj, b) + S } ^ V{uj,b) [to'] } ; (6) 



max J (l-5)r(w,6) + (5^y(u;,6)[u;'] I 



here r{uj, b) stands for the expected payoff of player 2 when playing b in the (au- 
tomaton) state u, where the expectation is taken w.r.t. the belief held at state u. 



A. 3 Proof of Lemma [2] 

Let a payoff vector 7 G [m^(p°), u^^ijp) + £:] x [f^(g°), f^(g°) + e] be given. For 5 high 
enough, we will define a sequential equilibrium profile in r(p°, g°) with payoff 7, using 



the ideas in Section 2.1.2 We need some preparations. 
Define 7^ and 7^ by the equations 

7^ = (l-5)n,(/) + 5(l-5)n,(/) + 527i, 

7I = {l-5)u{p\a^) + 5{l-5)u,{p'^) + 5H- 

7] (resp. 7o) are the continuation payoffs of player 1 at stage 2, which ensure that the 
expected payoff of player 1 is 7^, if player 1 plays the myopically suboptimal (resp. 
optimal) action at stage 1, and the myopically optimal action ^^ at stage 2. 
Define 7^ be the equality 

^^ = {I - 5)vM') + 5^1 

Because 7^ > Ui,{jP) > uijP^a}) and 7^ > f*(g°) it follows that 7^ > 7o > w*(p°) 
while 7o > f*(g°). For 5 high enough, and by definition of e, one has 

7] < Ui,{p^) + max/ipp and 7^ < f*(q'°) + max/igo. 

Hence, there exist PsiPo ^ [v^iP)^ and qo G [q^,q) such that 

V(Po) = 7o-^*(P°), 
V(P«) = 7l-M*(/), 
V(^o) = -il-v^{q^). 



Mimicking the previous section, we define 

^^The letters s, o remind that 7^ and 7^ are continuation payoffs following an optimal and a 
suboptimal action respectively. 
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VO': 



p^irn) 
p^{m) 

q^(rn) 



l/o, for m G M, where Ho solves p^ = i/oP^ + (1 ~ yo)Po- 



Us {iTL G M), where ys solves p° = ysP^ + (1 — ys)ps- 



^i = "FTT — \^ fo^ ^ ^ L, where x solves g° = xq^ + (1 — a;)go. 



We are now in a position to define a profile as follows (see also Figure 8). 

Stage 1: Player 2 plays b^, while player 1 plays the two actions a^ and a° with 
probabilities xi and 1 — xi. By Bayesian updating, following a^ the belief of 
player 2 in stage 2 is equal to q^, and it is equal to qo following a° (while the 
belief of player 1 is still p^). 

Stage 2: Player 2 randomizes. Following a^, player 2 plays the two actions 6" and b^ 
with probabilities ys^m and 1 — ys„^ respectively. Following a°, he plays the two 
actions b^ and o{qo) with probabilities |/o^m and 1— i/o,m respectively. Meanwhile, 
player 1 plays a°. By Bayesian updating, the belief of player 1 is equal to (i) p^ 
following either (a°,6^) or {a^,b^), (ii) to ps following {a^,o{qo)) and (iii) to po 
following (a^, b^). 

Stage 3 and on: If player 2 played his optimal action in stage 2, players repeat their 
optimal action. The continuation payoff is then (m*(p^), t;^(g^)) following (a^, b^) 
and is (■u^(pg),f^(go)) following {a^,o{qo))- Assume now that player 2 played b^ 
in stage 2, following a°. Beliefs are then {pi, q^) and players switch to the equilib- 



rium profile ((Tr°'^°, rf°'^°) of r(p^, go), with payoff (m^(Po), t;*(go) 



l-<5. 



<9o,6^)). 



Finally, assume that player 2 played b^ in stage 2, following a}. Beliefs are then 
(p^,g^), and players switch to the profile {a^ ''^ ,t^ ''^ ). 



belief p^,q^, payoff 7 



belief pi, (z\ payoff u4pi),t;4gi) + i^c(gi,60)) 
belief pi, gi, payoff u*(pj),'y^(gi) 
belief pi, Qo, payoff m* (pi), w*((jo) + ^c((7o,&i)) 
o(gc) ~" belief pi, Qo, payoff M*(pi),w*(qo) 

Figure 8: The evolution of beliefs and of continuation payoffs. 
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Beliefs and actions at information sets that are ruled by this description are defined 
as in the proof of Lemma [l] The equations defining 7^,70 (resp. 7^) ensure that player 
1 is indifferent in stage 1 (resp. player 2 in stage 2) between the two actions that 
are assigned positive probability. This implies the equilibrium property. Details are 
standard and omitted. 

A. 4 Proofs of Propositions [l] and [2] 

We start with the proof of Proposition [Tj The construction we provide here is more 
complex than needed for Proposition [Tj However, it will facilitate the proof of Propo- 
sition We let initial distributions p and q be given, in the interiors of ^\S x M) 
and A^'(T x L). Choose a segment [p°,p^] included in the interior of A"''(5' x M), such 
that (i) Ui, is not afiine on [p°,p^], and (ii) p E {jP,p^). 

By (i) and (ii), one has Ui,{j)) < yUi,{jP) + (1 — y)u-^{p^), where y solves yp'^ + (1 — 
y)p^ = p. Observe also that the quantity y-u*(p°) + (l — y)Mv,(p^) {withyp^ + {l—y)p^ = 
p) is strictly decreasing in the neighborhood oi p^, as p^ G [p°,p^] moves away from 
p^ and towards p^. 

By Lemmapl there exists eq > 0, 5 < 1, and neighborhoods V"(p*) and V{q^) of p* 
and q^ {i,j G {0, 1}), such that any payoff in [m^(p*),m^(p*) + Eq] x [v-^{q^),v^,{q^) +£0] 
is a sequential equilibrium payoff of the game T{p\ q^), as soon a.s 6 > 6. 

We now prove that the conclusion of Proposition [l] holds with e = Eq. Let 7 G 
[u^{p) , U-^{p) + Eq] X [f^(g),f*(g) + eq] be given. We describe an equilibrium profile 
that implements 7. 

One main feature of this profile is the following. As a result of information dis- 
closure by player 2, player I's belief will move in one stage from p to a belief p* close 
to either p^ or p^. Similarly, player 2's belief will change to a belief q^ close to either 
gO Qj, gi jj^ exactly one stage. From that point on, players implement an equilibrium 
of r{p^,q^) with the appropriate payoff. There is however one minor difference with 
previously defined equilibria. If u*(p) < 7^ < yUi,(jP) + (1 — y)Ui,{p^), then the ex- 
pected payoff of player 1 if we follow the previous construction will be higher than 
7^, which is the target payoff. There are two ways to overcome this difficulty. One 
way is to choose in this case p^ and p^ which are closer to p, thereby lowering the 
expected continuation payoff yUi,(jP) + (1 — y)Ui,{j>^). A second way, which we adopt 
here, is to delay information revelation, so that the discounted payoff is lower than 
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yu^P^) + (1 ~y)u^{p^). 

Define iVi > 1 to be the least integeiQJsuch that 7^ > yUi,{p'^) + {l—y)Ui,{p^), where 
7^ is defined by 7^ = (1— 5^^)-u^(p)+5^i7^. The inequality 7] > yu^{p'^) + {l—y)Ui,{p^) 
ensures that if player 2 starts revealing information at stage A^i, then one can support 
7^ as a continuation payoff of player 1 at that stage. Define N2 in a similar way for 
player 2, and assume w.l.o.g. that A^^i < N2. Information is first disclosed at stage 
A^^i. The choice of A^i implies 

7c - {yu.ip") + (1 - y)u,ip')) < ^— (i/M.(/) + (1 - y)u^{p') - u,{p)) , 

provided 6 is high enough. 

This implies that for 6 high enough, there is p^ G V{p^) fl [p'^,p^] such that 7^ = 
yu^ip^) + (1 - y)u^{p^), and yp^ + (1 - y)p^ = P- 

We first define a strategy pair {a, r) up to stage A'^i + l. Player 1 repeats an optimal 
action o{p) at all stages 1, . . . , Ni. Player 2 plays o{q) at all stages 1, . . . , A''i — 1. In 
stage A^i, player 2 plays both actions o(g) and b' 7^ o(g) with probabilities such that 
beliefs in stage Ni + 1 are {p^,q) following b', and {p^,q) following o(g). 

We now define the continuation of (a, r) following o(g). Define 7^ by the equality 
7^ = (1 — 6'^'^)v^,{q) + 5^^7^. The continuation of {(J,t) in the other case is defined 

in an analog way, except that 7^ has to be replaced by 7^ H z — c(g, 6'), and the 


equations that describe equilibrium constraints have to be adjusted. 

Let N2 be the least integer (possibly infinite) such that 7^ > xVi,{q^) + {l—x)v^^{q^), 

where 7^ is defined by 7^ = (1 — 6^'^)Vi,{q) + 5^27^. The choice of N2 implies 

f - {xv^iq"^) + (1 - x)v,{q^)) < ^— {xu^iq"^) + (1 - x)u,{q^) - v^{q)) , 

provided 6 is high enough. This implies that for 6 high enough, there is q^ G V{q^) fl 
[g°, q^] such that 7^ = xf^(g°) + (1 — x)v-^{q^), and a;g° + (1 — x)q^ = q. 

The continuation profile is defined as follows. Player 2 repeats o{q) in all stages 
Ni + l,...,Ni + N2- Player 1 repeats o{p^) in all stages iVi, . . . , A^i + . . . iVs - 1. In 
stage A?"! + A^2, player 1 plays both actions o{p^) and a 7^ o(p^) with probabilities such 
that the belief of player 2 is equal to q^ following o(p^), and to q^ following a. 



^Ni — 00 ii j^ = Ui,{p). 
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Following o(p^), players switch to an equilibrium of the game r{p^,q^) with payoff 
{Ui,{p^),v^{q^)). Following a, players switch to an equilibrium of the game r(p^,g°) 
with payoff {u^{p^) + ^c{p^,a),v^{q^)). 

Beliefs and actions off-the-equilibrium-path are defined as in the proof of Lemma 
[TJ The definition of beliefs and continuation payoffs ensure that players are indifferent 
whenever randomizing, and that the overall payoff is exactly 7. 

Observe also that there exists a neighborhood V{p) of p, with the following prop- 
erty. The two beliefs p'^ and p'^ associated with p' G V{p) can be chosen to be contin- 
uous in p' and x'u^,{p °) + (1 — x')u^{p'^) (with x'p'^ + (1 " x')p'^ = p') is bounded away 
from Ui,{p') over V{p). Together with the symmetric property for player 2, this en- 
sures that the robustness result mentioned after Proposition [l] holds. This concludes 
the proof of Proposition [T] 

0ot ot 

Let p G A (S" X M) , g G A (Tx L), 

and c > be given. Let 7 be such that |7"^— M^(p)| < (1— 5)cand [7^— f^(g)| < (1— (5)c. 

Let [p°,p^] be the segment associated with p in the proof of Proposition QJ and let y 

solve the equation p = yp^ + (1 — y)p^- Set 

V ■= {yMp^) + (1 - y)u4p^)) - Mp) > o, 

and let A^i be defined as in the proof of Proposition [l} By construction, one has 

(1 - 6''^-')uM + s'^'-'Mp) + v)<i'< Mp) + (1 - S)c, 

hence 7]6^^^^ < (1 — ^)c. Similarly, one has rjS'^'^^^ < (1 — S)c (for a possibly lower 
value of 1]). In the construction of Proposition [11 players repeat the same action until 
stage minlA^^i, A^2}- Therefore, for any p' G A"''(5' x M) and every strategy a, one has 

l\p',q,^,r,,g,,) < (l-(5--^^^'^^})w,(p') + ^'"'"^^^'^^^ 

< u4p') + {1-6)-. 
V 

The result follows, with C = c/rj. 

B General games 

We here complete the proof of the Main Theorem. We start with a few notations and 
remarks in the spirit of Section |A| We let initial distributions p G A(S' x L^ x Ms) 
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and q G A(T x Lt y< Mt) be given. W.l.o.g., we also assume that p{ls) > and 



p{It) > for each Is € Ls and It € Lj- We assume that the information of each 



player i is valuable for the other player. This is equivalent to assuming that, for each 
Is G Ls, there is no action a E A that is optimal at all beliefs Pis,ms '■= p{-\h,^TT's), 
ms G Ms- 

As the play proceeds, player 2 may disclose information relative to m^, and player 
I's belief about m^ may change. Analogously to the case of self-ignorant games (see 
Eq. ([5])), the behef p„ of player 1 given l^ = l^ is always in the set 

A}^(^ X Ms) = COiav{pig^rns ® hs ® 1ms, "^5 ^ ^g}. 

Note that p{-\ls) lies in the relative interior of the set A"''(S' x Ms). 

For rriT G Mt, we define Aj„^(T x Lt) in a symmetric way. The results of Section 
will be applied to the different sets A[ (S x Ms) and Aj„ (T x Lt) of initial 



A 



t 
distributions. 



B.l Providing Incentives 

For simplicity, we focus here on player 1. Analog properties hold for player 2 as well. 
We first define an equivalence relation ~ over Ls- As we will see, two signals [5. and 
Is such that Is ~ Is may be merged, and treated as a single signal. Given Is E Ls, 
we define a vector Z'^ of size Ms x A x Ahj 

^ms,a,a' '■= pi^slh) {u{pig^rns , a) " u^pi^^rng, a')) , for uis G Ms, and a, a' G A. 

Because the information held by player 2 is valuable for player 1, Z'-^ 7^ 0, for each 
Is G L. 
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Definition 2 Let Is, Is ^ Ls be given. The two signals Is and Is are equivalent, 
written [s ~ Is, if the two vectors Z-s and Z^^ are positively collinear, that is, if 

3a>0,Z^' = aZ^-s. (7) 

Plainly, if the two distributions p(-|/5.) andp(-|/5) in A(S'x Ms) coincide, then /^ ~ Is- 
However, the converse implication does not hold. 



^^And we make the symmetric assumption for player 2. 

^^Indeed, if Z'*' = 0, then any action a £ A is optimal at pi^ ms, for each ms- 
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Observe that, if /^ ~ Is and Z''^ = aZ-s, then for every two mixed actions 
x,x' G A{A) we have: 

P{ms\ls) {u{Pls,ms^^) - « {Pls,ms^^')) = (^Pi^sHs) {u{Pls,ms,x) " U {pig^ms, x')) . 

(8) 

As a preparation for Lemma |4] below, observe that a strategy a may be viewed 

as a collection {aig)ig^Lsy with the interpretation that ai^ : Lt ^ H ^ ^(^) is the 

'interim' strategy used if \s = Is^ 
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Lemma 4 Lei r &e any strategy of player 2. Then there exists a best reply a of player 
1 to T such that aj^ = ai^ whenever Is ^ is- 

According to Lemma |4| player 1 has a best reply that depends only on the equiv- 
alence class of Is- 

Proof. Let a strategy r of player 2 be fixed throughout. Given f : H -^ ^{^)i 
and (IsJt) ^ Ls x Lt, we denote by 7^(/, tI/^, /t) the interim expected payoff of 
player 1, when getting I5 = Is, It = h, and when playing according to / thereafter. 
Given n > 1, we also denote by g^^f, t\Is, It) the corresponding payoff at stage n. 

We let Is ~ Is be any two equivalent signals, so that Z'-^ = aZ-s for some 
a > 0. We will prove that, for every two "interim strategies" f : H ^ ^(^) ^-^^d 
f : H ^ ^{A), for every /^ G Lt and every stage n > 1, one has 

glif, r\ls, It) - glif, r\ls, h) = a {glU. r\l_s, It) - glif, r\l_s, h)) • (9) 

Equation ^ will imply that 

l\f, r\ls, It) - l\f\ r\ls, h) = a {^\f, r|/^, h) - i\f\ r|/^. It)) , 

from which the result follows. Indeed, if / is better than /' when the signal is Is-, 
then it is also the case when the signal is l_g. Therefore if / is a best response when 
the signal is Is, then it is also a best response when the signal is Ig. 

We let a stage n > 1 be given. We fix {Is, It) ^ Ls x Lt, and we decompose the 
payoff gliif, t\Is, It) as follows. For a given sequence of moves h G Hn := {A x B)"'~^, 
we denote by Pf^r{h\ls, It) the probability that h occurs, when (I5, 1^) = {Is, h) and 



^^To be formal, ai^{lT, h) is defined to be (j[ls, It, hi)- 
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players play according to / and r. We denote by P/,t-(-|^, ^5, ^t) £ ^(5* x Ms) the 
belief which is then held by player 1. 
With these notations, one has 



glifMh.lT) = Yl PfAh\lsjT)u{Pf,r{-\h,lsjT)),fm 

heHn 
The belief of player 1 following h is given by 

1 



(10) 



P/,r(s|/l,/s,^T) 



Pf,^{h,lsjT) 



^ P/,r(s,/i,/s,^r,ms), s G 5, 



mseMs 

where Pf^rihJsJr) = P/,r(^|^5, ^t)p(^s)5'(^t)- Because the state s and the history 
of moves until stage n are conditionally independent given (I5, Ir, 015), this belief is 
equal to 

1 



Pf,r{s\hJs,lT) 



PfAh,ls,lT) 



Y^ Pf,r{h\lsjT,ms)pis,ms{s)p{ls,ms)q{lT)- 



msGMs 



:iv. 



Plugging (11) into (fTOl), and using the linearity of u, one gets 



9liif,T\ls,lT)=Yl Yl P(^s\ls)PfAh\ls,lT,ms)u{pi^,n,^J{h)). (12) 

h&Hn ms&Ms 



Using (12) for both / and /', and because ^^^zh Pa,T{h \ Is, It, i^s) = 'Zh'eu ^a'^rih' 
IsJt,^tT's) = 1, we obtain: 

alif, t\Is, h) - giW, t\Is, It) 



(13) 



= Y Pi'^s\ls)\ Y'^fAWs-,lT,ms)u{pig^,ris,f{h))- Y Pf',rih'\ls,lT,ms)u{pig^rnsJ'ih')) 
msGMs \h€H„ h'eHn 

= Y Y Y^fAWs,h,rns)Vfi^^{h'\ls,lT,rns)xp{rns\ls){u{pis,rnsJ{h))-u{pig^rns,f{h'))). 

Because Is and Is are equivalent, Eq. (jOj) follows by ([s]), and (13) applied to both 

Is = Is and Is = Is- ■ 

Lemma |4] implies that any equilibrium of the modified game in which player 1 
only observes the equivalence class of l^ is an equilibrium of the original game. Put 
it differently, the set of equilibrium payoffs of the game in which players do not 
distinguish between equivalent signals is a subset of the set of equilibrium payoffs of 
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the game we started with. Besides, the values of m^ and v^, (resp., of m^* and f**) are 
the same for both games. 

Therefore, it is sufficient to prove that the conclusion of the Main Theorem holds 
for the modified game. In particular, we may and will assume from here on that, 
for every two signals Is 7^ i^, the vectors Z^^ and Z-s are not positively coUinear. 
We also make the symmetric assumption for player 2. A direct consequence of this 
assumption is Corollary [T] below. 

Corollary 1 Let a E A be arbitrary. For Is € Ls, define the vector Y^^ of size 
Ms X A by 

Then for every two signals Is 7^ Ls, the two vectors Y^^ and Y-s are not positively 
collinear. 

The vector Y^^ is equal to the projection of Z'^ on a lower- dimensional space. 
Hence, linear independence of Y'"'^ and of Y-s does not follow in general from linear 
independence of Z'^ and Z-s, and an ad hoc proof is needed. 

Proof. We argue by contradiction, and assume that Y'-'^ = aY-s for some a > 0. 
Let ms e Ms, a, a' G A be arbitrary. Observe that Z^^^^^^^, = Yj^^^^ - YJ^^^,, for 
Is = hyls- Hence Z^^ = aZ-s, a contradiction. ■ 

The next lemma is central to the provision of incentives (phase 2 of the equilibrium 
play). Given x : Ls x Ms — > A(A), and for every Is, k G Ls, we define 

^x[ls^k]= ^ p{ms\ls)u{pis,ms,Xk,ms), 

msGMs 

with the following interpretation. The expression E^[ls — )■ k] is the expected stage 
payoff when player 1 gets Ig = /^ G Ls, 'reports' k G Ls, is told ms, and plays 
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the mixed action Xk,ms that depends on player I's report, and on player 2's signal 
According to Lemma [5] below, the map x can be chosen in a way that this expected 
payoff is highest when reporting truthfully. 



^^We use the different letter k to distinguish between a signal and a report, although both belong 
to the same set Ls- 
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Lemma 5 There exists x* : Ls x Ms -^ ^(^); such that 

E^*[ls -^ k] < E^*[ls -^ Is], for every Is, k G Ls, Is 7^ k. 

Proof. Let a & A he arbitrary, and let x : Ls x Ms -^ A(^) be given. For 
k G Ls, we define a vector X'' of size Ms x A by X!^^^^ := Xk^msi^), ms G Ms, a E A. 
Observe that Xfc,ms(a) = 1 — y^Xk,msia)- Hence, Ex[ls — ^ k] may be rewritten as 

E^[ls ^ k] = ^ Pims\ls)u{pis,ms,a) 

ms^Ms 

= F's-X^+ J2 Pirns\lsMpis,ms,a). 

ms&Ms 

Because the second term in the last displayed equation does not depend on k, it is 
sufficient to construct x such that 

yis . xk ^ Y^s . xis for every Is, k G Ls, Is ^ k. (14) 

For Is E Ls, define 

IIVsll 

II -f ||2 

and let Is ^ k he arbitrary in Ls- Then by the Cauchy-Schwartz inequality, 

X^ . y's — . y's ^ ||y's|U = . y''S' = X''^ ■ Y''^ 

\\y% \\YH2 

where the strict inequality holds since Y'' and y'^ are not positively collinear. There- 



fore, (14) holds with (X's)^^^^^. Note that (14) still holds when the same constant 
is added to all components, and/or when all components are multiplied by the same 
constant > 0. Choose /3 G R and > such that all components of (pX^^ + (3 lie 
in (0, ij^f\j^i ), for all Is- Because Yj^^^^^ = 0, it suffices to set 

a^/s,ms(«) = ^^ms,a + /? for a ^ o, 
andx^^„^(a) = l-^xzg,^g(a). ■ 
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Given £2 : Ms — > A(M5), and Is, k G L5, we define 

^e2,xA^s^k]= ^ p{ms\ls)e2{fi\ms)u{pi,,,f„xl^^). 

This is the expected stage payoff of player 1 when (i) player 1 gets I5 = Is, and 
'reports' k, (ii) player 2 draws /i G M5 according to e2{-\ms) and (iii) player 1 plays 
xl ^. We here abuse notation and write p/g_^ for the belief of player 1, given Is and 

Observe that the expectation Ee2,x*[ls ~^ k] is continuous w.r.t. 62, and that 
E£2,x*[^5 — ^ k] is equal to Ei,.*[/5 — )■ fc] when £:2(-|"^5) assigns probability 1 to ms, for 
each ms- Corollary [2] below therefore follows from Lemma [5] by continuity. 

o 

Corollary 2 There exists £2 : Ms — )■ A{Ms), such that 

Es2,x* [Is ^ k]< Ee^^x* [Is -> Is], for every Is, k G Ls, Is 7^ k. (15) 

We fix £2 and x* for the rest of the paper. Because the distribution £2(-|^s) has full 
support, the conditional distribution pi^^^ lies in the relative interior of AJ^(S' x Ms) 
(for each fi G Ms). Define £1 analogously. 

B.2 Equilibrium strategies — Structure 

We let a payoff vector 7 = (7"*^, 7^) be given, with u^, < -y^ < m*j, and f^ < 7^ < f^^,. 
We will construct a sequential equilibrium with payoff 7. We let the discount factor 
6 be given. In the construction we add one additional message, D, to each player. 

Given x G A (A), and given a number A^ of stages, we denote by a^(a;) G A'^ , a 
sequence of actions of length A^ that provides the best approximation of the mixed 
action x in terms of discounted frequencies. That is, d^{x) = {an)i<n<N is chosen to 
minimize \\xs{a^) — x||oo, where 

1-6 ^ 



n=l 



The sequence a'^ix^.^^J will be the sequence of actions required from player 1 
in phase 2.2, when player 1 reports k & Ls and player 2 sends the message fii G 



^^Note that, for fixed /i, the behef Pi^^fj, depends on 62, ahhough this is not emphasized in the 
notation. 
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Ms U {□}. For /ii = n, we let ci^ {x\ ) be an arbitrary sequence of actions, that 
does not depend on k E Ls- 

Similarly, b^{y) G B^ is a vector that approximates the mixed action y in terms 
of discounted frequencies. 

We set Ki := maxJlL^I, \Mt\}, and we let ai : Ls ^ A^^ and /3i : My -> B^^ 
be arbitrary one-to-one maps. Similarly, we set i^2 := 1 + max{|LT|, jM^I}, and we 
let 0:2 : Lt U {D} — )> A^"^ and (52 '■ Ms U {D} — ?> B^"^ be arbitrary one-to-one maps. 
The maps ai and /3i are used to encode reports on one's own state into sequences of 
actions, while the maps a2 and /32 are used to encode messages on the other player's 
state into sequences of actions. 

o o 

We let Ti^ G A{Lt) and n^ G A{Ms) be arbitrary distributions with full support. 



We now proceed to the definition of a strategy profile {as,rs). The definition 
involves additional parameters 6X, and ip^,ip^ {i = 1,2), all in (0, 1), which will be 
chosen later. We first define the profile only at information sets that are not ruled 
out by the definition of (ctsjTs) at earlier information sets. The definition of {as,Ts) 
at information sets that are reached with probability zero will be provided after. 

Phase 1 It lasts Ki stages. Player 1 plays the sequence ai(l5') of actions, and player 
2 plays the sequence /3i(mT) of actions. 

Phase 2 It is divided into two subphases. Phase 2.1 and Phase 2.2. 

Phase 2.1 It lasts K2 stages. Player 1 first draws a message Ai G L^ U {□}. 
The probability assigned to D, (resp. to each I'j. G Lt), is equal to 1 — ^ 
(resp. ( X ei{l'rp\\T))- Symmetrically, player 2 draws a message fj,i G Ms U 
{D}. The probability assigned to D, (resp. to each m'g G Ms) is equal to 
1 - C, (resp. C X e2im'g\ms)). 
In that phase, the players play the sequences 0:2(^1) and P2if^i) of actions. 

Phase 2.2 It lasts u := [-\;^\ stages. Player 1 infers /ii from the actions 
played by player 2 in Phase 2.1, and plays the sequence a^ix^^^^J of actions. 
Meanwhile, player 2 infers Ai from the actions played by player 1 in Phase 
2.1, and plays the sequence b^y^^xj of actions. 
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Phase 3 It lasts K2 stages. Player 1 draws a message A2 € Lt- The distribution 
of A2 depends on Ai. If Ai = D, the probability assigned to ly (resp. to each 
I'j, ^ It), is equal (1 - ?/'^) + ^n x 7r^(lr) (resp. V'^ x 7r^(Z^)). If Ai ^ D, the 
probability assigned to A2 is equal to (1 — ip^) + ^^ x 7r^(A2) if A2 = It, and 
it is equal tp^ x 7r^(A2) otherwise. Player 2 draws a message ^2 ^ Ms- The 
distribution of ^2 depends on /ii, and is obtained as for player 1. 

In this phase, the players play the sequences q;2(A2) and (32(^^2) of actions. 

Phase 4 It contains all remaining stages. We denote hj N = Ki + 2K2 + z/ + 1 its 
first stage. Let h = {an{h),bn{h))n<N G (A x B)^^^ be the history of moves 
up to stage N. Player 2 infers from h the belief ]9„(/i) held by player 1 in each 
stage n < N along h. In this computation, the report of player 1 in Phase 1 
is assumed to be truthful. For n < N, the belief qn{h) G A(T x Lt) is defined 
in a symmetric way. The players compute 

ci(/i) = r^5^(l-5)5"-ic(p„(/i),a„(/i)) and C2{h) = ^^ ^(l-5)5'^-ic(g„(/i), 5„(/i)), 

n n 

where the sum is taken over all stages n of Phases 1, 2.1 and 3. Players then 
start playing according to the equilibrium profile of the semi-ignorant game 

r(p„(/i), g„(/i)), with payoff (m^(p„(/i)) + ci(/i), v^{qn{h)) + C2{h)). 

Some interpretation may be helpful. In Phase 2.1, the message D is uninforma- 
and is sent with high probability. In Phase 3, the level noise in the message 
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tive 

sent by player 1 depend on player I's first message, and is either tf)^ if the first message 

was informative, or ip^ otherwise. 

B.3 Equilibrium Strategies — Parameter values 

We now fix the parameter values, starting with 9. As 5 — )■ 1, the discounted weight 
of the [ ^^-^~^ ' \ stages of Phase 2.2 converges to 6. Thus, 6' is a measure of the 
contribution of the checking phase 2.2 to the total payoff. We choose G (0, 1) to be 
small enough so that the following set of inequalities is satisfied: 

(1 - e)'E[u^{pig^rns)\h = is,fJ'i = ms] > u^{p{-\\s = /5,/ii = "^5)), Vms G A<i6) 

{l-9)u,, > 7\ (17) 



^^ Since its probability does not depend on signals. 
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together with the symmetric conditions for player 2. 

By construction, the conditional distribution of ms given (\s,fii) = (Isyms) is 
independent of (, and only depends on the fixed map 62- Since e2{-\'m'g) has full 
support for each m'g, this conditional distribution has full support. Therefore, the 
residual information held by player 2 is still valuable to player 1, whatever be fi^ G 



{D} UM5. In particular, (16) holds with 9 = 0, and thus also for ^ > small enough. 



Because 7^ < u^,-^, condition (17) is also satisfied for small 6. 

Condition (16) ensures that, even if payoffs in phase 2.2 are very low, the weight 
6 of phase 2.2 is so small, that the residual value of the information held by player 2 
can still offset the cost incurred when playing the prescribed sequence in phase 2.2. 



Condition (16) is designed to make sure that, when in phase 2.2, player 1 will rather 
play the prescribed sequence of actions, than switch to an optimal action. 

Observe that with probability ! — (, player 1 receives no information prior to phase 
3. Hence, for C > small, the bulk of information exchange takes place in phase 3. 



Condition (17) ensures that, even if all information exchange is postponed to phase 



3, payoffs as high as 7^ can be implemented. 

Choose C £ (0, 1) to be small enough so that the two inequalities 

(1 - C)u, + Cm., < 7' < (1 - 0(1 - d)u,, (18) 

hold, together with the analog inequalities for player 2. 

In phase 2.2, the (conditional) optimal payoff of player 1 is m. if /ii = D, and does 
not exceed m^,. if /ii 7^ D. The first inequality ensures that the probability 1 — ( oi 
not disclosing information in phase 2.1 (/ii = D) is so high that the expectation of 
the optimal payoff given /ii does not exceed 7^. That is, additional information must 



be disclosed in phase 3 in order to implement 7. This inequality, together with (17) 



will allow us to adjust other parameter values in a way that the overall payoff is 7. 



The second inequality in (18) does not play a critical role. 



We now choose the value of ip^ G (0, 1) small enough so that, for every Is G 
Ls, ms G Ms, 

(1 - 9)E[u^{p{-\\s, Hi, fi2))\h = ^s,/ii = ms] > u^{p{-\h = ^5,/^! = ^5))- (19) 

In this expression, p{-\\s,Hi, fi2)) is the belief held by player 1 at the beginning of 
phase 4, after having received the two messages fj,i, fj,2 of player 2. The left-hand side 
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of (19) is continuous w.r.t. ip"^. For tp'^ = 0, /i2 is equal to m^ with probability 1, and 
(19) therefore holds by (16). Hence (19) holds for ip'^ > small enough. 



Observe that all parameters values (, 6, ip"^ are independent of the discount factor. 
The last parameter, ■?/'□ is chosen such that the expected payoff of player 1 is 7^. We 
first argue that for a given ?/'□, the limit discounted payoff of player 1, as 5 — )■ 1, is 
equal tc 
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eE[«(p(-|l5,/xi),<,^J] + (1 - ^)EK(p^)]. (20) 

Here is why. The contribution of Phases 1, 2.1 and 3 vanishes, as the length of 
these phases is fixed independently of 6. The expected payoff in phase 2.2 converge; 
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to E[m(p(-|15, /ii), a;^£,^^J]- Finally, for a fixed 6, the expected continuation payoff from 
stage N is equal to E['u^(p^)+ci(hAr)]. As Lemma[6]will show, E[ci(hjv)] will converge 
to 0. 

Observe that for ipl^ = 0, and following yUi = D, the message fi2 of player 2 is non- 
informative. Thus, conditional on the event that fii = D, player 2 does not disclose 



information prior to phase 4. Thus, for ?/'□ = 0, the left-hand side of (20) does not 



exceed (1 — ()u^ + (u^^, which by (18) is less than 7^. If tpl^ = 1, following fii = D 



the message /i2 is fully informative, and the left-hand side of (20) is at least equal to 



Cm* + (1 — C) (^w* + (1 ~ d)Ui,iP), which exceeds 7^ by (18). It follows that for 5 high 
enough, say 5 > 61, there exists ipl]{S) G (0,1), such that the discounted payoff of 
player 1 is equal to 7^, and such that ipl]{l) := lim^.^.! i'aiS) G (0, 1). 

We conclude this section by discussing how high should 6 be, for the profile {as, ts) 
to be well-defined, and by discussing beliefs and actions off-equilibrium. 
We first argue that the costs ci{h) and C2{h) are small. 

Lemma 6 There is c> such that for every 6 > 61 and every h G Hjsf, one has 

ci{h) < {l-6)c. 
Proof. Because payoffs are bounded by 1, one has 

ci(ft) < (A'i + 2Ay(l-,5)r" = (A-, + 2A-2)ji^ri'=&^J 



■ §Ki+2K2+l ^ ' ^ 5-^1+2-^2+1 ln(l 



'^'^We here abuse notation, since N — > +cx} as (5 — > 1. However, the Umit of E[u*(p^)] is weU- 
defined. 

■^-^ Because the approximation of x* by xs{a{x*)) becomes perfectly accurate as J — > 1. 
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and the result follows. ■ 

For 6 < 1 (including 6=1), denote by V{6) the support of p^ when ■?/'□ is set to 
'?/'g(5), and define Q{S) in a symmetric way. Since tt2 and e2{-\'ms) have full support, 

ot 

and since ■ip^,ipl^{l) G (0, 1), one has p^ G \g{S x Ms), with probability 1. 

Because V{1) and Q(l) are finite sets, and by Proposition [l| there is 62 < 1, 
e > 0, and neighborhoods V{p) oi p E V, V{q) of g G Q, such that any payoff in 
[u-^{p'),u^,{p') + e] X [f^(g'), f^(g') + e] is a sequential equilibrium of T{p', q'), for every 
p eV,p' e V{p), and q e Q,q' e V{q). 

In addition, we choose the neighborhoods V{p), V{q) to be small enough, and 
C > so that the conclusion of Proposition |2] holds for every p G V,p' & V{p), and 
qeQ,q'eViq). 

We choose ^3 < 1 to be high enough so that the following conditions are met for 
each 6 > 63: (i) every p' G V{6) belongs to V{p) for some p eV; (ii) (1 — 6)c < e. 

For 5 > 63, the profile {as, Tg) is then well-defined, at any information set that is 
not ruled out by the definition of {as, Tg) at earlier stages. 

Consider now an information set J/^ that is reached with probability 0, and assume 
that the information set J//^/ is reached with positive probability, where h' is the 
longest prefix of h. 

If the sequence h of actions has probability zero, then we let beliefs at //^^ and at 
all subsequent information sets coincide with the belief held at //^,. Player 1 repeats 
the action that is optimal at //^,. 

Assume now that the sequence h has positive probability. This corresponds to 
the case where player 1 misreported in Phase 1, and played consistently with his 
report afterwards. Then the belief of player 1 at //^ is well-defined by Bayes' rule 
(and is independent of player I's strategy), and only assigns a positive probability to 
information sets /^ ^^ that are reached with positive probability under r^. We let a-^^ 
play at //^^ a best reply to r^. 

By construction, sequential rationality holds at any information set //^^ that is 
reached with probability zero. One can verify that beliefs are consistent with (a^, r^). 
We omit the proof. 



49 



B.4 Equilibrium properties 

We claim that the profile {as, ts) is a sequential equilibrium profile for 5 < 1 high 
enough. 

Let ?7 > be small enough so that 

Ee^M^s ^k]< Ee:,,cc*[^s ^ h] - '^V ^OT every Is, k E Lg, Is 7^ k, 

and we choose ^4 < 1 such that 

Ee2,xs{dix*))[ls -^ k] < E£2,^,(s(^*))[/s ^ ls]-r] for every Is, k E Ls,ls 7^ k and 5 > 84. 

We finally choose ^5 < 1 to be such that 1 - 5^^+'^^^ + (1 - 5)C < rjS^^^'^^^ for each 

We now verify that (uj,, r^) is a sequential equilibrium, as soon as 5 > max{(54, 5{\. 
It is sufficient to check that sequential rationality holds at any information set that 
is reached with positive probability. Let such an information set Ii^h be given, and let 
n be the stage to which /; /^ belongs. If stage n belongs to phase 4, then sequential 
rationality at Ii^h follows because continuation strategies in phase 4 form a sequential 
equilibrium of the associated self-ignorant game. Assume then that n < N . 

We will make use of the following observation that holds because £i{-), £2{-), '^^ 
and vr^ have full support: if Iis,iT,h is reached with positive probability, then the set 
of actions that are played with positive probability at Iig^i^^h does not depend on It, 
and, therefore, the information set Ii^y /j is also reached with positive probability, for 
every I'g E Ls- We note that the compensation made in phase 4 implies that player 1 
is indifferent at Iis,iT,h between all actions that are played with positive probability. 
One thus simply needs to check that player 1 cannot increase his continuation payoff 
by playing some other action, a. 

Assume first that n belongs to either phase 2.1, 2.2 or to phase 3. In that case, 
the set of actions that are played at Ii^h does not depend on /. Hence, when playing 
a, player 1 triggers a myopic play by player 2, and player I's overall payoff in that 
case does not exceed 

(1 - 5)Ui,{pn) + (5E[m^(p„+i)|/, h]. 

On the other hand, the expected continuation payoff of player 1 at Ii^h is at least 
(5^E[-u^(p^)|/, h]. Sequential rationality then follow from the choice of parameters. 
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Assume finally that stage n belongs to phase 1. Again, it is not profitable to switch 
to an action that triggers a myopic play from player 2. What if player 1, instead of 
reporting Is-, chooses to report k ^ Is '^ Then, as above, the choice of parameters 
ensures that it is optimal for player 1 to play consistently with k, at least until phase 
4. Such a deviation yields a payoff (discounted back to h) of at most 

S-n ^s^^+K,^^^^^^^.^^^^^^i^ ^k] + {l-6){l + --- + <5^^+2i..-i^ ^ <5^EK(p^) + (1 - 6)C]) 

On the other hand, player I's continuation payoff when reporting truthfully is at least 

r" ((5^^+^=^E,,,,,(„^(,*))[/5 ^ Is] + S^'Elu^ip^) + (1 - 5)C]) . 

We stress that the distribution of pjy is the same in both expressions, because the 
distribution of (/ii,/i2) does not depend on player I's report. The result follows, by 
the choice of ^4 and ^5. 
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